diff --git a/COPYING b/COPYING new file mode 100644 index 000000000..e963df829 --- /dev/null +++ b/COPYING @@ -0,0 +1,622 @@ + GNU GENERAL PUBLIC LICENSE + Version 3, 29 June 2007 + + Copyright (C) 2007 Free Software Foundation, Inc. + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The GNU General Public License is a free, copyleft license for +software and other kinds of works. + + The licenses for most software and other practical works are designed +to take away your freedom to share and change the works. By contrast, +the GNU General Public License is intended to guarantee your freedom to +share and change all versions of a program--to make sure it remains free +software for all its users. We, the Free Software Foundation, use the +GNU General Public License for most of our software; it applies also to +any other work released this way by its authors. You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +them if you wish), that you receive source code or can get it if you +want it, that you can change the software or use pieces of it in new +free programs, and that you know you can do these things. + + To protect your rights, we need to prevent others from denying you +these rights or asking you to surrender the rights. Therefore, you have +certain responsibilities if you distribute copies of the software, or if +you modify it: responsibilities to respect the freedom of others. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must pass on to the recipients the same +freedoms that you received. You must make sure that they, too, receive +or can get the source code. And you must show them these terms so they +know their rights. + + Developers that use the GNU GPL protect your rights with two steps: +(1) assert copyright on the software, and (2) offer you this License +giving you legal permission to copy, distribute and/or modify it. + + For the developers' and authors' protection, the GPL clearly explains +that there is no warranty for this free software. For both users' and +authors' sake, the GPL requires that modified versions be marked as +changed, so that their problems will not be attributed erroneously to +authors of previous versions. + + Some devices are designed to deny users access to install or run +modified versions of the software inside them, although the manufacturer +can do so. This is fundamentally incompatible with the aim of +protecting users' freedom to change the software. The systematic +pattern of such abuse occurs in the area of products for individuals to +use, which is precisely where it is most unacceptable. Therefore, we +have designed this version of the GPL to prohibit the practice for those +products. If such problems arise substantially in other domains, we +stand ready to extend this provision to those domains in future versions +of the GPL, as needed to protect the freedom of users. + + Finally, every program is threatened constantly by software patents. +States should not allow patents to restrict development and use of +software on general-purpose computers, but in those that do, we wish to +avoid the special danger that patents applied to a free program could +make it effectively proprietary. To prevent this, the GPL assures that +patents cannot be used to render the program non-free. + + The precise terms and conditions for copying, distribution and +modification follow. + + TERMS AND CONDITIONS + + 0. Definitions. + + "This License" refers to version 3 of the GNU General Public License. + + "Copyright" also means copyright-like laws that apply to other kinds of +works, such as semiconductor masks. + + "The Program" refers to any copyrightable work licensed under this +License. Each licensee is addressed as "you". "Licensees" and +"recipients" may be individuals or organizations. + + To "modify" a work means to copy from or adapt all or part of the work +in a fashion requiring copyright permission, other than the making of an +exact copy. The resulting work is called a "modified version" of the +earlier work or a work "based on" the earlier work. + + A "covered work" means either the unmodified Program or a work based +on the Program. + + To "propagate" a work means to do anything with it that, without +permission, would make you directly or secondarily liable for +infringement under applicable copyright law, except executing it on a +computer or modifying a private copy. Propagation includes copying, +distribution (with or without modification), making available to the +public, and in some countries other activities as well. + + To "convey" a work means any kind of propagation that enables other +parties to make or receive copies. Mere interaction with a user through +a computer network, with no transfer of a copy, is not conveying. + + An interactive user interface displays "Appropriate Legal Notices" +to the extent that it includes a convenient and prominently visible +feature that (1) displays an appropriate copyright notice, and (2) +tells the user that there is no warranty for the work (except to the +extent that warranties are provided), that licensees may convey the +work under this License, and how to view a copy of this License. If +the interface presents a list of user commands or options, such as a +menu, a prominent item in the list meets this criterion. + + 1. Source Code. + + The "source code" for a work means the preferred form of the work +for making modifications to it. "Object code" means any non-source +form of a work. + + A "Standard Interface" means an interface that either is an official +standard defined by a recognized standards body, or, in the case of +interfaces specified for a particular programming language, one that +is widely used among developers working in that language. + + The "System Libraries" of an executable work include anything, other +than the work as a whole, that (a) is included in the normal form of +packaging a Major Component, but which is not part of that Major +Component, and (b) serves only to enable use of the work with that +Major Component, or to implement a Standard Interface for which an +implementation is available to the public in source code form. A +"Major Component", in this context, means a major essential component +(kernel, window system, and so on) of the specific operating system +(if any) on which the executable work runs, or a compiler used to +produce the work, or an object code interpreter used to run it. + + The "Corresponding Source" for a work in object code form means all +the source code needed to generate, install, and (for an executable +work) run the object code and to modify the work, including scripts to +control those activities. However, it does not include the work's +System Libraries, or general-purpose tools or generally available free +programs which are used unmodified in performing those activities but +which are not part of the work. For example, Corresponding Source +includes interface definition files associated with source files for +the work, and the source code for shared libraries and dynamically +linked subprograms that the work is specifically designed to require, +such as by intimate data communication or control flow between those +subprograms and other parts of the work. + + The Corresponding Source need not include anything that users +can regenerate automatically from other parts of the Corresponding +Source. + + The Corresponding Source for a work in source code form is that +same work. + + 2. Basic Permissions. + + All rights granted under this License are granted for the term of +copyright on the Program, and are irrevocable provided the stated +conditions are met. This License explicitly affirms your unlimited +permission to run the unmodified Program. The output from running a +covered work is covered by this License only if the output, given its +content, constitutes a covered work. This License acknowledges your +rights of fair use or other equivalent, as provided by copyright law. + + You may make, run and propagate covered works that you do not +convey, without conditions so long as your license otherwise remains +in force. You may convey covered works to others for the sole purpose +of having them make modifications exclusively for you, or provide you +with facilities for running those works, provided that you comply with +the terms of this License in conveying all material for which you do +not control copyright. Those thus making or running the covered works +for you must do so exclusively on your behalf, under your direction +and control, on terms that prohibit them from making any copies of +your copyrighted material outside their relationship with you. + + Conveying under any other circumstances is permitted solely under +the conditions stated below. Sublicensing is not allowed; section 10 +makes it unnecessary. + + 3. Protecting Users' Legal Rights From Anti-Circumvention Law. + + No covered work shall be deemed part of an effective technological +measure under any applicable law fulfilling obligations under article +11 of the WIPO copyright treaty adopted on 20 December 1996, or +similar laws prohibiting or restricting circumvention of such +measures. + + When you convey a covered work, you waive any legal power to forbid +circumvention of technological measures to the extent such circumvention +is effected by exercising rights under this License with respect to +the covered work, and you disclaim any intention to limit operation or +modification of the work as a means of enforcing, against the work's +users, your or third parties' legal rights to forbid circumvention of +technological measures. + + 4. Conveying Verbatim Copies. + + You may convey verbatim copies of the Program's source code as you +receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy an appropriate copyright notice; +keep intact all notices stating that this License and any +non-permissive terms added in accord with section 7 apply to the code; +keep intact all notices of the absence of any warranty; and give all +recipients a copy of this License along with the Program. + + You may charge any price or no price for each copy that you convey, +and you may offer support or warranty protection for a fee. + + 5. Conveying Modified Source Versions. + + You may convey a work based on the Program, or the modifications to +produce it from the Program, in the form of source code under the +terms of section 4, provided that you also meet all of these conditions: + + a) The work must carry prominent notices stating that you modified + it, and giving a relevant date. + + b) The work must carry prominent notices stating that it is + released under this License and any conditions added under section + 7. This requirement modifies the requirement in section 4 to + "keep intact all notices". + + c) You must license the entire work, as a whole, under this + License to anyone who comes into possession of a copy. This + License will therefore apply, along with any applicable section 7 + additional terms, to the whole of the work, and all its parts, + regardless of how they are packaged. This License gives no + permission to license the work in any other way, but it does not + invalidate such permission if you have separately received it. + + d) If the work has interactive user interfaces, each must display + Appropriate Legal Notices; however, if the Program has interactive + interfaces that do not display Appropriate Legal Notices, your + work need not make them do so. + + A compilation of a covered work with other separate and independent +works, which are not by their nature extensions of the covered work, +and which are not combined with it such as to form a larger program, +in or on a volume of a storage or distribution medium, is called an +"aggregate" if the compilation and its resulting copyright are not +used to limit the access or legal rights of the compilation's users +beyond what the individual works permit. Inclusion of a covered work +in an aggregate does not cause this License to apply to the other +parts of the aggregate. + + 6. Conveying Non-Source Forms. + + You may convey a covered work in object code form under the terms +of sections 4 and 5, provided that you also convey the +machine-readable Corresponding Source under the terms of this License, +in one of these ways: + + a) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by the + Corresponding Source fixed on a durable physical medium + customarily used for software interchange. + + b) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by a + written offer, valid for at least three years and valid for as + long as you offer spare parts or customer support for that product + model, to give anyone who possesses the object code either (1) a + copy of the Corresponding Source for all the software in the + product that is covered by this License, on a durable physical + medium customarily used for software interchange, for a price no + more than your reasonable cost of physically performing this + conveying of source, or (2) access to copy the + Corresponding Source from a network server at no charge. + + c) Convey individual copies of the object code with a copy of the + written offer to provide the Corresponding Source. This + alternative is allowed only occasionally and noncommercially, and + only if you received the object code with such an offer, in accord + with subsection 6b. + + d) Convey the object code by offering access from a designated + place (gratis or for a charge), and offer equivalent access to the + Corresponding Source in the same way through the same place at no + further charge. You need not require recipients to copy the + Corresponding Source along with the object code. If the place to + copy the object code is a network server, the Corresponding Source + may be on a different server (operated by you or a third party) + that supports equivalent copying facilities, provided you maintain + clear directions next to the object code saying where to find the + Corresponding Source. Regardless of what server hosts the + Corresponding Source, you remain obligated to ensure that it is + available for as long as needed to satisfy these requirements. + + e) Convey the object code using peer-to-peer transmission, provided + you inform other peers where the object code and Corresponding + Source of the work are being offered to the general public at no + charge under subsection 6d. + + A separable portion of the object code, whose source code is excluded +from the Corresponding Source as a System Library, need not be +included in conveying the object code work. + + A "User Product" is either (1) a "consumer product", which means any +tangible personal property which is normally used for personal, family, +or household purposes, or (2) anything designed or sold for incorporation +into a dwelling. In determining whether a product is a consumer product, +doubtful cases shall be resolved in favor of coverage. For a particular +product received by a particular user, "normally used" refers to a +typical or common use of that class of product, regardless of the status +of the particular user or of the way in which the particular user +actually uses, or expects or is expected to use, the product. A product +is a consumer product regardless of whether the product has substantial +commercial, industrial or non-consumer uses, unless such uses represent +the only significant mode of use of the product. + + "Installation Information" for a User Product means any methods, +procedures, authorization keys, or other information required to install +and execute modified versions of a covered work in that User Product from +a modified version of its Corresponding Source. The information must +suffice to ensure that the continued functioning of the modified object +code is in no case prevented or interfered with solely because +modification has been made. + + If you convey an object code work under this section in, or with, or +specifically for use in, a User Product, and the conveying occurs as +part of a transaction in which the right of possession and use of the +User Product is transferred to the recipient in perpetuity or for a +fixed term (regardless of how the transaction is characterized), the +Corresponding Source conveyed under this section must be accompanied +by the Installation Information. But this requirement does not apply +if neither you nor any third party retains the ability to install +modified object code on the User Product (for example, the work has +been installed in ROM). + + The requirement to provide Installation Information does not include a +requirement to continue to provide support service, warranty, or updates +for a work that has been modified or installed by the recipient, or for +the User Product in which it has been modified or installed. Access to a +network may be denied when the modification itself materially and +adversely affects the operation of the network or violates the rules and +protocols for communication across the network. + + Corresponding Source conveyed, and Installation Information provided, +in accord with this section must be in a format that is publicly +documented (and with an implementation available to the public in +source code form), and must require no special password or key for +unpacking, reading or copying. + + 7. Additional Terms. + + "Additional permissions" are terms that supplement the terms of this +License by making exceptions from one or more of its conditions. +Additional permissions that are applicable to the entire Program shall +be treated as though they were included in this License, to the extent +that they are valid under applicable law. If additional permissions +apply only to part of the Program, that part may be used separately +under those permissions, but the entire Program remains governed by +this License without regard to the additional permissions. + + When you convey a copy of a covered work, you may at your option +remove any additional permissions from that copy, or from any part of +it. (Additional permissions may be written to require their own +removal in certain cases when you modify the work.) You may place +additional permissions on material, added by you to a covered work, +for which you have or can give appropriate copyright permission. + + Notwithstanding any other provision of this License, for material you +add to a covered work, you may (if authorized by the copyright holders of +that material) supplement the terms of this License with terms: + + a) Disclaiming warranty or limiting liability differently from the + terms of sections 15 and 16 of this License; or + + b) Requiring preservation of specified reasonable legal notices or + author attributions in that material or in the Appropriate Legal + Notices displayed by works containing it; or + + c) Prohibiting misrepresentation of the origin of that material, or + requiring that modified versions of such material be marked in + reasonable ways as different from the original version; or + + d) Limiting the use for publicity purposes of names of licensors or + authors of the material; or + + e) Declining to grant rights under trademark law for use of some + trade names, trademarks, or service marks; or + + f) Requiring indemnification of licensors and authors of that + material by anyone who conveys the material (or modified versions of + it) with contractual assumptions of liability to the recipient, for + any liability that these contractual assumptions directly impose on + those licensors and authors. + + All other non-permissive additional terms are considered "further +restrictions" within the meaning of section 10. If the Program as you +received it, or any part of it, contains a notice stating that it is +governed by this License along with a term that is a further +restriction, you may remove that term. If a license document contains +a further restriction but permits relicensing or conveying under this +License, you may add to a covered work material governed by the terms +of that license document, provided that the further restriction does +not survive such relicensing or conveying. + + If you add terms to a covered work in accord with this section, you +must place, in the relevant source files, a statement of the +additional terms that apply to those files, or a notice indicating +where to find the applicable terms. + + Additional terms, permissive or non-permissive, may be stated in the +form of a separately written license, or stated as exceptions; +the above requirements apply either way. + + 8. Termination. + + You may not propagate or modify a covered work except as expressly +provided under this License. Any attempt otherwise to propagate or +modify it is void, and will automatically terminate your rights under +this License (including any patent licenses granted under the third +paragraph of section 11). + + However, if you cease all violation of this License, then your +license from a particular copyright holder is reinstated (a) +provisionally, unless and until the copyright holder explicitly and +finally terminates your license, and (b) permanently, if the copyright +holder fails to notify you of the violation by some reasonable means +prior to 60 days after the cessation. + + Moreover, your license from a particular copyright holder is +reinstated permanently if the copyright holder notifies you of the +violation by some reasonable means, this is the first time you have +received notice of violation of this License (for any work) from that +copyright holder, and you cure the violation prior to 30 days after +your receipt of the notice. + + Termination of your rights under this section does not terminate the +licenses of parties who have received copies or rights from you under +this License. If your rights have been terminated and not permanently +reinstated, you do not qualify to receive new licenses for the same +material under section 10. + + 9. Acceptance Not Required for Having Copies. + + You are not required to accept this License in order to receive or +run a copy of the Program. Ancillary propagation of a covered work +occurring solely as a consequence of using peer-to-peer transmission +to receive a copy likewise does not require acceptance. However, +nothing other than this License grants you permission to propagate or +modify any covered work. These actions infringe copyright if you do +not accept this License. Therefore, by modifying or propagating a +covered work, you indicate your acceptance of this License to do so. + + 10. Automatic Licensing of Downstream Recipients. + + Each time you convey a covered work, the recipient automatically +receives a license from the original licensors, to run, modify and +propagate that work, subject to this License. You are not responsible +for enforcing compliance by third parties with this License. + + An "entity transaction" is a transaction transferring control of an +organization, or substantially all assets of one, or subdividing an +organization, or merging organizations. If propagation of a covered +work results from an entity transaction, each party to that +transaction who receives a copy of the work also receives whatever +licenses to the work the party's predecessor in interest had or could +give under the previous paragraph, plus a right to possession of the +Corresponding Source of the work from the predecessor in interest, if +the predecessor has it or can get it with reasonable efforts. + + You may not impose any further restrictions on the exercise of the +rights granted or affirmed under this License. For example, you may +not impose a license fee, royalty, or other charge for exercise of +rights granted under this License, and you may not initiate litigation +(including a cross-claim or counterclaim in a lawsuit) alleging that +any patent claim is infringed by making, using, selling, offering for +sale, or importing the Program or any portion of it. + + 11. Patents. + + A "contributor" is a copyright holder who authorizes use under this +License of the Program or a work on which the Program is based. The +work thus licensed is called the contributor's "contributor version". + + A contributor's "essential patent claims" are all patent claims +owned or controlled by the contributor, whether already acquired or +hereafter acquired, that would be infringed by some manner, permitted +by this License, of making, using, or selling its contributor version, +but do not include claims that would be infringed only as a +consequence of further modification of the contributor version. For +purposes of this definition, "control" includes the right to grant +patent sublicenses in a manner consistent with the requirements of +this License. + + Each contributor grants you a non-exclusive, worldwide, royalty-free +patent license under the contributor's essential patent claims, to +make, use, sell, offer for sale, import and otherwise run, modify and +propagate the contents of its contributor version. + + In the following three paragraphs, a "patent license" is any express +agreement or commitment, however denominated, not to enforce a patent +(such as an express permission to practice a patent or covenant not to +sue for patent infringement). To "grant" such a patent license to a +party means to make such an agreement or commitment not to enforce a +patent against the party. + + If you convey a covered work, knowingly relying on a patent license, +and the Corresponding Source of the work is not available for anyone +to copy, free of charge and under the terms of this License, through a +publicly available network server or other readily accessible means, +then you must either (1) cause the Corresponding Source to be so +available, or (2) arrange to deprive yourself of the benefit of the +patent license for this particular work, or (3) arrange, in a manner +consistent with the requirements of this License, to extend the patent +license to downstream recipients. "Knowingly relying" means you have +actual knowledge that, but for the patent license, your conveying the +covered work in a country, or your recipient's use of the covered work +in a country, would infringe one or more identifiable patents in that +country that you have reason to believe are valid. + + If, pursuant to or in connection with a single transaction or +arrangement, you convey, or propagate by procuring conveyance of, a +covered work, and grant a patent license to some of the parties +receiving the covered work authorizing them to use, propagate, modify +or convey a specific copy of the covered work, then the patent license +you grant is automatically extended to all recipients of the covered +work and works based on it. + + A patent license is "discriminatory" if it does not include within +the scope of its coverage, prohibits the exercise of, or is +conditioned on the non-exercise of one or more of the rights that are +specifically granted under this License. You may not convey a covered +work if you are a party to an arrangement with a third party that is +in the business of distributing software, under which you make payment +to the third party based on the extent of your activity of conveying +the work, and under which the third party grants, to any of the +parties who would receive the covered work from you, a discriminatory +patent license (a) in connection with copies of the covered work +conveyed by you (or copies made from those copies), or (b) primarily +for and in connection with specific products or compilations that +contain the covered work, unless you entered into that arrangement, +or that patent license was granted, prior to 28 March 2007. + + Nothing in this License shall be construed as excluding or limiting +any implied license or other defenses to infringement that may +otherwise be available to you under applicable patent law. + + 12. No Surrender of Others' Freedom. + + If conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot convey a +covered work so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you may +not convey it at all. For example, if you agree to terms that obligate you +to collect a royalty for further conveying from those to whom you convey +the Program, the only way you could satisfy both those terms and this +License would be to refrain entirely from conveying the Program. + + 13. Use with the GNU Affero General Public License. + + Notwithstanding any other provision of this License, you have +permission to link or combine any covered work with a work licensed +under version 3 of the GNU Affero General Public License into a single +combined work, and to convey the resulting work. The terms of this +License will continue to apply to the part which is the covered work, +but the special requirements of the GNU Affero General Public License, +section 13, concerning interaction through a network will apply to the +combination as such. + + 14. Revised Versions of this License. + + The Free Software Foundation may publish revised and/or new versions of +the GNU General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + + Each version is given a distinguishing version number. If the +Program specifies that a certain numbered version of the GNU General +Public License "or any later version" applies to it, you have the +option of following the terms and conditions either of that numbered +version or of any later version published by the Free Software +Foundation. If the Program does not specify a version number of the +GNU General Public License, you may choose any version ever published +by the Free Software Foundation. + + If the Program specifies that a proxy can decide which future +versions of the GNU General Public License can be used, that proxy's +public statement of acceptance of a version permanently authorizes you +to choose that version for the Program. + + Later license versions may give you additional or different +permissions. However, no additional obligations are imposed on any +author or copyright holder as a result of your choosing to follow a +later version. + + 15. Disclaimer of Warranty. + + THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY +APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT +HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY +OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM +IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF +ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 16. Limitation of Liability. + + IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS +THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY +GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE +USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF +DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD +PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), +EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF +SUCH DAMAGES. + + 17. Interpretation of Sections 15 and 16. + + If the disclaimer of warranty and limitation of liability provided +above cannot be given local legal effect according to their terms, +reviewing courts shall apply local law that most closely approximates +an absolute waiver of all civil liability in connection with the +Program, unless a warranty or assumption of liability accompanies a +copy of the Program in return for a fee. + + END OF TERMS AND CONDITIONS + diff --git a/INSTALL b/INSTALL new file mode 100644 index 000000000..63b99f4ea --- /dev/null +++ b/INSTALL @@ -0,0 +1,109 @@ +== Basic build == + +1. Edit config.mk. Follow the comments there. + Optionally you can change compiler settings in include_[GCC|ICC].mk. + Please note that only the default compiler flags are supported and tested. +2. make +(2.a make likwid-bench (if you want to build and install likwid-bench) +3. make install (this is required for likwid-pin and if you use the accessDaemon) +4. setup access to the msr device files (see end of this document) + +Only the default flags set are tested. As it is not possible to test all +compiler setting variants the Intel icc compiler is only build tested. A basic +function test is done for the icc binary. The only variant fully tested is gcc +with default compiler flags. It is therefore recommended to use gcc with the +default flags. If you want to use and build the Fortran interface you can mix +GCC with the Intel Fortran Compiler. More information on this can be found in +the WIKI. + +*NOTICE* + +All generated files are located in the [GCC|ICC] build directory. +This includes the dependency files, object files and also the +generated source files and the pas and assembly files for likwid-bench. +If you debug your likwid-bench benchmarks you can look at all +intermediate build files and also the final assembly code. + +== Known problems == + +On very old systems with old kernels (< 2.6.7) or old glibc versions likwid +is build with reduced funtionality. This includes missing support for NUMA +and pinning. + +== Additional Targets == + +make clean - clean the object directory +make distclean - clean also the executables/libraries +make uninstall - delete installed files + +== Build likwid-bench == + +To build likwid-bench you have to explicitly call: + +make likwid-bench + +This is because likwid-bench does not compile on 32bit systems. + +== Build accessDaemon == + +To build the accessDaemon: + +1. Edit config.mk and configure path in ACCESSDAEMON variable. +2. Set the desired default ACCESSMODE. You can overwrite this on the command line. +2. make will also build the accessDaemon +3. Install with + make install + +With the standard make install target the daemon will also be installed in +${PREFIX}/bin . Don't forget to copy the dameon if you configured a different +path in ACCESSDAEMON. + +== Setup of msr module == + +likwid-perfctr, likwid-powermeter and likwid-features require the Linux msr kernel module. This module +is part of most standard distro kernels. You have to be root to do the initial setup. + +1. Check if the msr module is loaded with 'lsmod | grep msr' . There should be an output. +2. It the module is not loaded load it with 'modprobe msr' . For automatic loading at startup +consult your distros documentation how to do so. +3. Adopt access rights on the msr device files for normal user. To allow everybody access you can +use 'chmod o+rw /dev/cpu/*/msr' . This is only recommended on save single user desktop systems. + +As a general access to the msr registers is not desired on security sensitive +systems you can either implement a more sophisticated access rights settings +with e.g. setgid. A common solution used on many other device files, e.g. for +audio, is to introduce a group and make a chown on the msr device files to that +group. Now if you execute likwid-perfctr with setgid on that group the +executing user can use the tool but cannot directly write or read the msr +device files. + +A secure solution is to use the accessDaemon, which encapsulates the access to +the msr device files and performs a address check for allowed registers. For +more information how to setup and use this solution have a look at the WIKI +page: + +http://code.google.com/p/likwid/wiki/MSRDaemon + +A demo for a root exploit involving the msr device files was published. As +a consequence the security settings for access to the msr device files are +tightened in recent kernels. + +Just setting the file access rights or using suid root on the access daemon is +not sufficient anymore. You have to register your binary now to get access. +This is only necessary if above setup dos not work. + +You register the necessary capability by calling + +sudo setcap cap_sys_rawio+ep EXECUTABLE + +on the executables. This is only possible on local file systems. +The only feasable way is to register the likwid-accessD and proxy all access over it. + +If you have still problems please let me know on the likwid mailing list: + +http://groups.google.com/group/likwid-users + + + + + diff --git a/Makefile b/Makefile new file mode 100644 index 000000000..abcdf6c37 --- /dev/null +++ b/Makefile @@ -0,0 +1,242 @@ +# ======================================================================================= +# +# Filename: Makefile +# +# Description: Central Makefile +# +# Version: +# Released: +# +# Author: Jan Treibig (jt), jan.treibig@gmail.com +# Project: likwid +# +# Copyright (C) 2013 Jan Treibig +# +# This program is free software: you can redistribute it and/or modify it under +# the terms of the GNU General Public License as published by the Free Software +# Foundation, either version 3 of the License, or (at your option) any later +# version. +# +# This program is distributed in the hope that it will be useful, but WITHOUT ANY +# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A +# PARTICULAR PURPOSE. See the GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License along with +# this program. If not, see . +# +# ======================================================================================= + +SRC_DIR = ./src +DOC_DIR = ./doc +GROUP_DIR = ./groups +FILTER_DIR = ./filters +MAKE_DIR = ./make +EXT_TARGETS = ./ext/lua ./ext/hwloc ./src/libwid + +#DO NOT EDIT BELOW + + +# Dependency chains: +# *.[ch] -> *.o -> executables +# *.ptt -> *.pas -> *.s -> *.o -> executables +# *.txt -> *.h (generated) + +include ./config.mk +include $(MAKE_DIR)/include_$(COMPILER).mk +include $(MAKE_DIR)/config_checks.mk +include $(MAKE_DIR)/config_defines.mk + +INCLUDES += -I./src/includes -I./ext/lua/includes -I./ext/hwloc/include -I$(BUILD_DIR) +LIBS += + +#CONFIGURE BUILD SYSTEM +BUILD_DIR = ./$(COMPILER) +Q ?= @ +GENGROUPLOCK = .gengroup + +ifeq ($(COMPILER),MIC) +BENCH_DIR = ./bench/phi +else +ifeq ($(COMPILER),GCCX86) +BENCH_DIR = ./bench/x86 +else +BENCH_DIR = ./bench/x86-64 +endif +endif + +ifeq ($(SHARED_LIBRARY),true) +CFLAGS += $(SHARED_CFLAGS) +LIBS += -L. -llikwid -lm +DYNAMIC_TARGET_LIB := liblikwid.so +TARGET_LIB := $(DYNAMIC_TARGET_LIB) +else +STATIC_TARGET_LIB := liblikwid.a +TARGET_LIB := $(STATIC_TARGET_LIB) +endif + + +VPATH = $(SRC_DIR) +OBJ = $(patsubst $(SRC_DIR)/%.c, $(BUILD_DIR)/%.o,$(wildcard $(SRC_DIR)/*.c)) +OBJ += $(patsubst $(SRC_DIR)/%.cc, $(BUILD_DIR)/%.o,$(wildcard $(SRC_DIR)/*.cc)) +PERFMONHEADERS = $(patsubst $(SRC_DIR)/includes/%.txt, $(BUILD_DIR)/%.h,$(wildcard $(SRC_DIR)/includes/*.txt)) +OBJ_BENCH = $(patsubst $(BENCH_DIR)/%.ptt, $(BUILD_DIR)/%.o,$(wildcard $(BENCH_DIR)/*.ptt)) +OBJ_LUA = $(wildcard ./ext/lua/$(COMPILER)/*.o) +OBJ_HWLOC = $(wildcard ./ext/hwloc/$(COMPILER)/*.o) +OBJ_LIBWID = $(wildcard ./src/libwid/$(COMPILER)/*.o) + +APPS = likwid-perfctr \ + likwid-features \ + likwid-powermeter \ + likwid-memsweeper \ + likwid-topology \ + likwid-genCfg \ + likwid-pin \ + likwid-bench + +LIBWID = libwid.a +LIBHWLOC = ext/hwloc/libhwloc.a + +CPPFLAGS := $(CPPFLAGS) $(DEFINES) $(INCLUDES) + +all: $(BUILD_DIR) $(GENGROUPLOCK) $(PERFMONHEADERS) $(OBJ) $(OBJ_BENCH) $(EXT_TARGETS) $(STATIC_TARGET_LIB) $(DYNAMIC_TARGET_LIB) $(APPS) $(FORTRAN_INTERFACE) $(PINLIB) $(DAEMON_TARGET) + +tags: + @echo "===> GENERATE TAGS" + $(Q)ctags -R + +$(APPS): $(addprefix $(SRC_DIR)/applications/,$(addsuffix .c,$(APPS))) $(BUILD_DIR) $(GENGROUPLOCK) $(OBJ) $(OBJ_BENCH) + @echo "===> LINKING $@" + $(Q)${CC} $(CFLAGS) $(ANSI_CFLAGS) $(CPPFLAGS) ${LFLAGS} -o $@ $(addprefix $(SRC_DIR)/applications/,$(addsuffix .c,$@)) $(OBJ_BENCH) $(TARGET_LIB) $(LIBHWLOC) $(LIBS) + +$(STATIC_TARGET_LIB): $(OBJ) + @echo "===> CREATE STATIC LIB $(STATIC_TARGET_LIB)" + $(Q)${AR} -cq $(STATIC_TARGET_LIB) $(OBJ) $(OBJ_HWLOC) + +$(LIBWID): $(OBJ_LUA) $(OBJ_HWLOC) $(OBJ_LIBWID) + @echo "===> CREATE STATIC LIB $(LIBWID)" + $(Q)${AR} -cq $(LIBWID) $(OBJ_LUA) $(OBJ_HWLOC) $(OBJ_LIBWID) + + +$(DYNAMIC_TARGET_LIB): $(OBJ) + @echo "===> CREATE SHARED LIB $(DYNAMIC_TARGET_LIB)" + $(Q)${CC} $(SHARED_LFLAGS) $(SHARED_CFLAGS) -o $(DYNAMIC_TARGET_LIB) $(OBJ) $(OBJ_HWLOC) + +$(DAEMON_TARGET): $(SRC_DIR)/access-daemon/accessDaemon.c + @echo "===> Build access daemon likwid-accessD" + $(Q)$(MAKE) -C $(SRC_DIR)/access-daemon + +$(BUILD_DIR): + @mkdir $(BUILD_DIR) + +$(PINLIB): + @echo "===> CREATE LIB $(PINLIB)" + $(Q)$(MAKE) -s -C src/pthread-overload/ $(PINLIB) + +$(GENGROUPLOCK): $(foreach directory,$(shell ls $(GROUP_DIR)), $(wildcard $(GROUP_DIR)/$(directory)/*.txt)) + @echo "===> GENERATE GROUP HEADERS" + $(Q)$(GEN_GROUPS) ./groups $(BUILD_DIR) ./perl/templates + $(Q)touch $(GENGROUPLOCK) + +$(FORTRAN_INTERFACE): $(SRC_DIR)/likwid.f90 + @echo "===> COMPILE FORTRAN INTERFACE $@" + $(Q)$(FC) -c $(FCFLAGS) $< + @rm -f likwid.o + +$(EXT_TARGETS): + @echo "===> ENTER $@" + $(Q)$(MAKE) --no-print-directory -C $@ $(MAKECMDGOALS) + +#PATTERN RULES +$(BUILD_DIR)/%.o: %.c + @echo "===> COMPILE $@" + $(Q)$(CC) -c $(CFLAGS) $(ANSI_CFLAGS) $(CPPFLAGS) $< -o $@ + $(Q)$(CC) $(CPPFLAGS) -MT $(@:.d=.o) -MM $< > $(BUILD_DIR)/$*.d + +$(BUILD_DIR)/%.o: %.cc + @echo "===> COMPILE $@" + $(Q)$(CXX) -c $(CXXFLAGS) $(CPPFLAGS) $< -o $@ + $(Q)$(CXX) $(CXXFLAGS) $(CPPFLAGS) -MT $(@:.d=.o) -MM $< > $(BUILD_DIR)/$*.d + + +$(BUILD_DIR)/%.pas: $(BENCH_DIR)/%.ptt + @echo "===> GENERATE BENCHMARKS" + $(Q)$(GEN_PAS) $(BENCH_DIR) $(BUILD_DIR) ./perl/templates + +$(BUILD_DIR)/%.h: $(SRC_DIR)/includes/%.txt + @echo "===> GENERATE HEADER $@" + $(Q)$(GEN_PMHEADER) $< $@ + +$(BUILD_DIR)/%.o: $(BUILD_DIR)/%.pas + @echo "===> ASSEMBLE $@" + $(Q)$(PAS) -i $(PASFLAGS) -o $(BUILD_DIR)/$*.s $< '$(DEFINES)' + $(Q)$(AS) $(ASFLAGS) $(BUILD_DIR)/$*.s -o $@ + +ifeq ($(findstring $(MAKECMDGOALS),clean),) +-include $(OBJ:.o=.d) +endif + +.PHONY: clean distclean install uninstall $(EXT_TARGETS) + + +.PRECIOUS: $(BUILD_DIR)/%.pas + +.NOTPARALLEL: + + +clean: $(EXT_TARGETS) + @echo "===> CLEAN" + @rm -rf $(BUILD_DIR) + @rm -f $(GENGROUPLOCK) + +distclean: clean + @echo "===> DIST CLEAN" + @rm -f likwid-* + @rm -f $(STATIC_TARGET_LIB) + @rm -f $(DYNAMIC_TARGET_LIB) + @rm -f $(FORTRAN_INTERFACE) + @rm -f $(PINLIB) + @rm -f tags + +install: + @echo "===> INSTALL applications to $(PREFIX)/bin" + @mkdir -p $(PREFIX)/bin + @cp -f likwid-* $(PREFIX)/bin + @cp -f perl/feedGnuplot $(PREFIX)/bin + @cp -f perl/likwid-* $(PREFIX)/bin + @chmod 755 $(PREFIX)/bin/likwid-* + @echo "===> INSTALL man pages to $(MANPREFIX)/man1" + @mkdir -p $(MANPREFIX)/man1 + @sed -e "s//$(VERSION)/g" -e "s//$(DATE)/g" < $(DOC_DIR)/likwid-topology.1 > $(MANPREFIX)/man1/likwid-topology.1 + @sed -e "s//$(VERSION)/g" -e "s//$(DATE)/g" < $(DOC_DIR)/likwid-features.1 > $(MANPREFIX)/man1/likwid-features.1 + @sed -e "s//$(VERSION)/g" -e "s//$(DATE)/g" < $(DOC_DIR)/likwid-perfctr.1 > $(MANPREFIX)/man1/likwid-perfctr.1 + @sed -e "s//$(VERSION)/g" -e "s//$(DATE)/g" < $(DOC_DIR)/likwid-powermeter.1 > $(MANPREFIX)/man1/likwid-powermeter.1 + @sed -e "s//$(VERSION)/g" -e "s//$(DATE)/g" < $(DOC_DIR)/likwid-pin.1 > $(MANPREFIX)/man1/likwid-pin.1 + @chmod 644 $(MANPREFIX)/man1/likwid-* + @echo "===> INSTALL headers to $(PREFIX)/include" + @mkdir -p $(PREFIX)/include + @cp -f src/includes/likwid*.h $(PREFIX)/include/ + $(FORTRAN_INSTALL) + @echo "===> INSTALL libraries to $(PREFIX)/lib" + @mkdir -p $(PREFIX)/lib + @cp -f liblikwid* $(PREFIX)/lib + @chmod 755 $(PREFIX)/lib/$(PINLIB) + @echo "===> INSTALL filters to $(LIKWIDFILTERPATH)" + @mkdir -p $(LIKWIDFILTERPATH) + @cp -f filters/* $(LIKWIDFILTERPATH) + @chmod 755 $(LIKWIDFILTERPATH)/* + +uninstall: + @echo "===> REMOVING applications from $(PREFIX)/bin" + @rm -f $(addprefix $(PREFIX)/bin/,$(APPS)) + @rm -f $(PREFIX)/bin/likwid-mpirun + @rm -f $(PREFIX)/bin/likwid-perfscope + @rm -f $(PREFIX)/bin/feedGnuplot + @echo "===> REMOVING man pages from $(MANPREFIX)/man1" + @rm -f $(addprefix $(MANPREFIX)/man1/,$(addsuffix .1,$(APPS))) + @echo "===> REMOVING libs from $(PREFIX)/lib" + @rm -f $(PREFIX)/lib/liblikwid* + @echo "===> REMOVING filter from $(PREFIX)/share" + @rm -rf $(PREFIX)/share/likwid + + + diff --git a/README b/README new file mode 100644 index 000000000..fa1e85cb6 --- /dev/null +++ b/README @@ -0,0 +1,27 @@ +Likwid is a simple to install and use toolsuite of command line applications +for performance oriented programmers. It works for Intel and AMD processors +on the Linux operating system. + +It consists of: + +likwid-topology - print thread and cache topology +likwid-features - view and toggle feature reagister on Intel processors +likwid-perfctr - configure and read out hardware performance counters on Intel and AMD processors +likwid-powermeter - read out RAPL Energy information and get info about Turbo Mode steps +likwid-pin - pin your threaded application (pthread, Intel and gcc OpenMP to dedicated processors +likwid-bench - Micro benchmarking platform +likwid-gencfg - Dumps topology information to a file +likwid-mpirun - Wrapper to start MPI and Hybrid MPI/OpenMP applications (Supports Intel MPI and OpenMPI) +likwid-scope - Frontend to the timeline mode of likwid-perfctr, plots live graphs of performance metrics + +For a detailed documentation on the usage of the tools have a look at the +likwid wiki pages at: + +http://code.google.com/p/likwid/wiki/Introduction + +If you have problems or suggestions please let me know on the likwid mailing list: + +http://groups.google.com/group/likwid-users + + + diff --git a/bench/phi/copy.ptt b/bench/phi/copy.ptt new file mode 100644 index 000000000..81622bf04 --- /dev/null +++ b/bench/phi/copy.ptt @@ -0,0 +1,13 @@ +STREAMS 2 +TYPE DOUBLE +FLOPS 0 +BYTES 16 +LOOP 32 +vmovaps zmm0, [STR0 + GPR1 * 8] +vmovaps zmm1, [STR0 + GPR1 * 8 + 64] +vmovaps zmm2, [STR0 + GPR1 * 8 + 128] +vmovaps zmm3, [STR0 + GPR1 * 8 + 192] +vmovaps [STR1 + GPR1 * 8] , zmm0 +vmovaps [STR1 + GPR1 * 8 + 64], zmm1 +vmovaps [STR1 + GPR1 * 8 + 128], zmm2 +vmovaps [STR1 + GPR1 * 8 + 192], zmm3 diff --git a/bench/phi/copy_mem.ptt b/bench/phi/copy_mem.ptt new file mode 100644 index 000000000..3891a38cd --- /dev/null +++ b/bench/phi/copy_mem.ptt @@ -0,0 +1,19 @@ +STREAMS 2 +TYPE DOUBLE +FLOPS 0 +BYTES 16 +LOOP 32 +vprefetch0 [STR0 + GPR1 * 8 + 1024] +vmovaps zmm0, [STR0 + GPR1 * 8] +vmovaps zmm1, [STR0 + GPR1 * 8 + 64] +vmovaps zmm2, [STR0 + GPR1 * 8 + 128] +vmovaps zmm3, [STR0 + GPR1 * 8 + 192] +vprefetch0 [STR1 + GPR1 * 8 + 1024] +vmovnrngoaps [STR1 + GPR1 * 8], zmm0 +clevict1 [STR1 + GPR1 * 8] +vmovnrngoaps [STR1 + GPR1 * 8 + 64], zmm1 +clevict1 [STR1 + GPR1 * 8 + 64] +vmovnrngoaps [STR1 + GPR1 * 8 + 128], zmm2 +clevict1 [STR1 + GPR1 * 8 + 128] +vmovnrngoaps [STR1 + GPR1 * 8 + 192], zmm3 +clevict1 [STR1 + GPR1 * 8 + 192] diff --git a/bench/phi/copy_p0.ptt b/bench/phi/copy_p0.ptt new file mode 100644 index 000000000..49527a245 --- /dev/null +++ b/bench/phi/copy_p0.ptt @@ -0,0 +1,17 @@ +STREAMS 2 +TYPE DOUBLE +FLOPS 0 +BYTES 16 +LOOP 32 +vprefetch1 [STR0 + GPR1 * 8 + 2048] +vprefetch0 [STR0 + GPR1 * 8 + 256] +vmovaps zmm0, [STR0 + GPR1 * 8] +vmovaps zmm1, [STR0 + GPR1 * 8 + 64] +vmovaps zmm2, [STR0 + GPR1 * 8 + 128] +vmovaps zmm3, [STR0 + GPR1 * 8 + 192] +vprefetche1 [STR1 + GPR1 * 8 + 2048] +vprefetche0 [STR1 + GPR1 * 8 + 256] +vmovaps [STR1 + GPR1 * 8] , zmm0 +vmovaps [STR1 + GPR1 * 8 + 64], zmm1 +vmovaps [STR1 + GPR1 * 8 + 128], zmm2 +vmovaps [STR1 + GPR1 * 8 + 192], zmm3 diff --git a/bench/phi/copy_p1.ptt b/bench/phi/copy_p1.ptt new file mode 100644 index 000000000..c129b4db1 --- /dev/null +++ b/bench/phi/copy_p1.ptt @@ -0,0 +1,38 @@ +STREAMS 2 +TYPE DOUBLE +FLOPS 0 +BYTES 16 +vprefetch0 [STR0 + GPR1 * 8] +vprefetch0 [STR0 + GPR1 * 8 + 256] +vprefetch0 [STR0 + GPR1 * 8 + 512] +vprefetch0 [STR0 + GPR1 * 8 + 768] +vprefetche0 [STR1 + GPR1 * 8 ] +vprefetche0 [STR1 + GPR1 * 8 + 256] +LOOP 32 +vmovaps zmm0, [STR0 + GPR1 * 8] +vprefetch1 [STR0 + GPR1 * 8 + 2048] +vmovaps zmm1, [STR0 + GPR1 * 8 + 64] +vprefetch0 [STR0 + GPR1 * 8 + 1024] +vmovaps zmm2, [STR0 + GPR1 * 8 + 128] +vprefetche1 [STR1 + GPR1 * 8 + 2048] +vmovaps zmm3, [STR0 + GPR1 * 8 + 192] +vprefetche0 [STR1 + GPR1 * 8 + 1024] +vmovaps [STR1 + GPR1 * 8] , zmm0 +vprefetch1 [STR0 + GPR1 * 8 + 2112] +vmovaps [STR1 + GPR1 * 8 + 64], zmm1 +vprefetch0 [STR0 + GPR1 * 8 + 1088] +vmovaps [STR1 + GPR1 * 8 + 128], zmm2 +vprefetche1 [STR1 + GPR1 * 8 + 2112] +vmovaps [STR1 + GPR1 * 8 + 192], zmm3 +vprefetche0 [STR1 + GPR1 * 8 + 1088] +vprefetch1 [STR0 + GPR1 * 8 + 2176] +vprefetch0 [STR0 + GPR1 * 8 + 1152] +vprefetche1 [STR1 + GPR1 * 8 + 2176] +vprefetche0 [STR1 + GPR1 * 8 + 1152] +vprefetch1 [STR0 + GPR1 * 8 + 2240] +vprefetch0 [STR0 + GPR1 * 8 + 1216] +vprefetche1 [STR1 + GPR1 * 8 + 2240] +vprefetche0 [STR1 + GPR1 * 8 + 1216] + + + diff --git a/bench/phi/load.ptt b/bench/phi/load.ptt new file mode 100644 index 000000000..e8367c277 --- /dev/null +++ b/bench/phi/load.ptt @@ -0,0 +1,10 @@ +STREAMS 1 +TYPE DOUBLE +FLOPS 0 +BYTES 8 +LOOP 32 +vprefetch0 [STR0 + GPR1 * 8 + 1024] +vmovaps zmm0, [STR0 + GPR1 * 8] +vmovaps zmm1, [STR0 + GPR1 * 8 + 64] +vmovaps zmm2, [STR0 + GPR1 * 8 + 128] +vmovaps zmm3, [STR0 + GPR1 * 8 + 192] diff --git a/bench/phi/store.ptt b/bench/phi/store.ptt new file mode 100644 index 000000000..533501c07 --- /dev/null +++ b/bench/phi/store.ptt @@ -0,0 +1,14 @@ +STREAMS 1 +TYPE DOUBLE +FLOPS 0 +BYTES 8 +vmovaps zmm0, [SCALAR] +vmovaps zmm1, [SCALAR] +vmovaps zmm2, [SCALAR] +vmovaps zmm3, [SCALAR] +LOOP 32 +vprefetch0 [STR0 + GPR1 * 8 + 1024] +vmovaps [STR0 + GPR1 * 8] , zmm0 +vmovaps [STR0 + GPR1 * 8 + 64], zmm1 +vmovaps [STR0 + GPR1 * 8 + 128], zmm2 +vmovaps [STR0 + GPR1 * 8 + 192], zmm3 diff --git a/bench/phi/store_mem.ptt b/bench/phi/store_mem.ptt new file mode 100644 index 000000000..fa8d2625f --- /dev/null +++ b/bench/phi/store_mem.ptt @@ -0,0 +1,18 @@ +STREAMS 1 +TYPE DOUBLE +FLOPS 0 +BYTES 8 +vmovaps zmm0, [SCALAR] +vmovaps zmm1, [SCALAR] +vmovaps zmm2, [SCALAR] +vmovaps zmm3, [SCALAR] +LOOP 32 +vprefetch0 [STR0 + GPR1 * 8 + 1024] +vmovnrngoaps [STR0 + GPR1 * 8], zmm0 +clevict1 [STR0 + GPR1 * 8] +vmovnrngoaps [STR0 + GPR1 * 8 + 64], zmm1 +clevict1 [STR0 + GPR1 * 8 + 64] +vmovnrngoaps [STR0 + GPR1 * 8 + 128], zmm2 +clevict1 [STR0 + GPR1 * 8 + 128] +vmovnrngoaps [STR0 + GPR1 * 8 + 192], zmm3 +clevict1 [STR0 + GPR1 * 8 + 192] diff --git a/bench/phi/sum.ptt b/bench/phi/sum.ptt new file mode 100644 index 000000000..e5d4c57be --- /dev/null +++ b/bench/phi/sum.ptt @@ -0,0 +1,10 @@ +STREAMS 1 +TYPE SINGLE +FLOPS 1 +BYTES 4 +LOOP 64 +vprefetch0 [STR0 + GPR1 * 8 + 1024] +vaddps zmm0, zmm0, [STR0 + GPR1 * 8] +vaddps zmm1, zmm1, [STR0 + GPR1 * 8 + 64] +vaddps zmm2, zmm2, [STR0 + GPR1 * 8 + 128] +vaddps zmm3, zmm3, [STR0 + GPR1 * 8 + 192] diff --git a/bench/phi/triad.ptt b/bench/phi/triad.ptt new file mode 100644 index 000000000..f38fe30ba --- /dev/null +++ b/bench/phi/triad.ptt @@ -0,0 +1,21 @@ +STREAMS 4 +TYPE DOUBLE +FLOPS 2 +BYTES 32 +LOOP 32 +vmovaps zmm0, [STR1 + GPR1*8] +vmovaps zmm1, [STR1 + GPR1*8+64] +vmovaps zmm2, [STR1 + GPR1*8+128] +vmovaps zmm3, [STR1 + GPR1*8+192] +vmovaps zmm4, [STR2 + GPR1*8] +vmovaps zmm5, [STR2 + GPR1*8+64] +vmovaps zmm6, [STR2 + GPR1*8+128] +vmovaps zmm7, [STR2 + GPR1*8+192] +vfmadd132pd zmm0, zmm4, [STR3 + GPR1*8] +vfmadd132pd zmm1, zmm5, [STR3 + GPR1*8+64] +vfmadd132pd zmm2, zmm6, [STR3 + GPR1*8+128] +vfmadd132pd zmm3, zmm7, [STR3 + GPR1*8+192] +vmovaps [STR0 + GPR1*8], zmm0 +vmovaps [STR0 + GPR1*8+64], zmm1 +vmovaps [STR0 + GPR1*8+128], zmm2 +vmovaps [STR0 + GPR1*8+192], zmm3 diff --git a/bench/phi/triad_mem.ptt b/bench/phi/triad_mem.ptt new file mode 100644 index 000000000..a9babee76 --- /dev/null +++ b/bench/phi/triad_mem.ptt @@ -0,0 +1,29 @@ +STREAMS 4 +TYPE DOUBLE +FLOPS 2 +BYTES 32 +LOOP 32 +vprefetch0 [STR1 + GPR1 * 8 + 1024] +vprefetch0 [STR2 + GPR1 * 8 + 1024] +vprefetch0 [STR3 + GPR1 * 8 + 1024] +vmovaps zmm0, [STR1 + GPR1*8] +vmovaps zmm1, [STR1 + GPR1*8+64] +vmovaps zmm2, [STR1 + GPR1*8+128] +vmovaps zmm3, [STR1 + GPR1*8+192] +vmovaps zmm4, [STR2 + GPR1*8] +vmovaps zmm5, [STR2 + GPR1*8+64] +vmovaps zmm6, [STR2 + GPR1*8+128] +vmovaps zmm7, [STR2 + GPR1*8+192] +vfmadd132pd zmm0, zmm4, [STR3 + GPR1*8] +vfmadd132pd zmm1, zmm5, [STR3 + GPR1*8+64] +vfmadd132pd zmm2, zmm6, [STR3 + GPR1*8+128] +vfmadd132pd zmm3, zmm7, [STR3 + GPR1*8+192] +vprefetch0 [STR0 + GPR1 * 8 + 1024] +vmovnrngoaps [STR0 + GPR1 * 8], zmm0 +clevict1 [STR0 + GPR1 * 8] +vmovnrngoaps [STR0 + GPR1 * 8 + 64], zmm1 +clevict1 [STR0 + GPR1 * 8 + 64] +vmovnrngoaps [STR0 + GPR1 * 8 + 128], zmm2 +clevict1 [STR0 + GPR1 * 8 + 128] +vmovnrngoaps [STR0 + GPR1 * 8 + 192], zmm3 +clevict1 [STR0 + GPR1 * 8 + 192] diff --git a/bench/phi/update.ptt b/bench/phi/update.ptt new file mode 100644 index 000000000..a4d4e34e7 --- /dev/null +++ b/bench/phi/update.ptt @@ -0,0 +1,14 @@ +STREAMS 1 +TYPE DOUBLE +FLOPS 0 +BYTES 16 +LOOP 32 +vprefetch0 [STR0 + GPR1 * 8 + 1024] +vmovaps zmm0, [STR0 + GPR1 * 8] +vmovaps zmm1, [STR0 + GPR1 * 8 + 64] +vmovaps zmm2, [STR0 + GPR1 * 8 + 128] +vmovaps zmm3, [STR0 + GPR1 * 8 + 192] +vmovaps [STR0 + GPR1 * 8] , zmm0 +vmovaps [STR0 + GPR1 * 8 + 64], zmm1 +vmovaps [STR0 + GPR1 * 8 + 128], zmm2 +vmovaps [STR0 + GPR1 * 8 + 192], zmm3 diff --git a/bench/x86-64/clcopy.ptt b/bench/x86-64/clcopy.ptt new file mode 100644 index 000000000..b59c2bed3 --- /dev/null +++ b/bench/x86-64/clcopy.ptt @@ -0,0 +1,15 @@ +STREAMS 2 +TYPE DOUBLE +FLOPS 0 +BYTES 16 +LOOP 32 +movaps FPR1, [STR0 + GPR1 * 8 ] +movaps FPR2, [STR0 + GPR1 * 8 + 64 ] +movaps FPR3, [STR0 + GPR1 * 8 + 128 ] +movaps FPR4, [STR0 + GPR1 * 8 + 192 ] +movaps [STR1 + GPR1 * 8 ], FPR1 +movaps [STR1 + GPR1 * 8 + 64 ], FPR2 +movaps [STR1 + GPR1 * 8 + 128 ], FPR3 +movaps [STR1 + GPR1 * 8 + 192 ], FPR4 + + diff --git a/bench/x86-64/clload.ptt b/bench/x86-64/clload.ptt new file mode 100644 index 000000000..8c3ddc2b5 --- /dev/null +++ b/bench/x86-64/clload.ptt @@ -0,0 +1,11 @@ +STREAMS 1 +TYPE DOUBLE +FLOPS 0 +BYTES 8 +LOOP 32 +movaps FPR1, [STR0 + GPR1 * 8] +movaps FPR2, [STR0 + GPR1 * 8 + 64] +movaps FPR3, [STR0 + GPR1 * 8 + 128] +movaps FPR4, [STR0 + GPR1 * 8 + 192] + + diff --git a/bench/x86-64/clstore.ptt b/bench/x86-64/clstore.ptt new file mode 100644 index 000000000..5541b8ec8 --- /dev/null +++ b/bench/x86-64/clstore.ptt @@ -0,0 +1,14 @@ +STREAMS 1 +TYPE DOUBLE +FLOPS 0 +BYTES 8 +movaps FPR1, [SCALAR] +movaps FPR2, [SCALAR] +movaps FPR3, [SCALAR] +movaps FPR4, [SCALAR] +LOOP 32 +movaps [STR0 + GPR1 * 8], FPR1 +movaps [STR0 + GPR1 * 8 + 64], FPR2 +movaps [STR0 + GPR1 * 8 + 128], FPR3 +movaps [STR0 + GPR1 * 8 + 192], FPR4 + diff --git a/bench/x86-64/copy.ptt b/bench/x86-64/copy.ptt new file mode 100644 index 000000000..ffca4f5dc --- /dev/null +++ b/bench/x86-64/copy.ptt @@ -0,0 +1,15 @@ +STREAMS 2 +TYPE DOUBLE +FLOPS 0 +BYTES 16 +LOOP 8 +movaps FPR1, [STR0 + GPR1 * 8] +movaps FPR2, [STR0 + GPR1 * 8 + 16] +movaps FPR3, [STR0 + GPR1 * 8 + 32] +movaps FPR4, [STR0 + GPR1 * 8 + 48] +movaps [STR1 + GPR1 * 8] , FPR1 +movaps [STR1 + GPR1 * 8 + 16], FPR2 +movaps [STR1 + GPR1 * 8 + 32], FPR3 +movaps [STR1 + GPR1 * 8 + 48], FPR4 + + diff --git a/bench/x86-64/copy_mem.ptt b/bench/x86-64/copy_mem.ptt new file mode 100644 index 000000000..fab5a667a --- /dev/null +++ b/bench/x86-64/copy_mem.ptt @@ -0,0 +1,15 @@ +STREAMS 2 +TYPE DOUBLE +FLOPS 0 +BYTES 16 +LOOP 8 +movaps FPR1, [STR0 + GPR1 * 8] +movaps FPR2, [STR0 + GPR1 * 8 + 16] +movaps FPR3, [STR0 + GPR1 * 8 + 32] +movaps FPR4, [STR0 + GPR1 * 8 + 48] +movntpd [STR1 + GPR1 * 8] , FPR1 +movntpd [STR1 + GPR1 * 8 + 16], FPR2 +movntpd [STR1 + GPR1 * 8 + 32], FPR3 +movntpd [STR1 + GPR1 * 8 + 48], FPR4 + + diff --git a/bench/x86-64/load.ptt b/bench/x86-64/load.ptt new file mode 100644 index 000000000..36aaab1c9 --- /dev/null +++ b/bench/x86-64/load.ptt @@ -0,0 +1,12 @@ +STREAMS 1 +TYPE DOUBLE +FLOPS 0 +BYTES 8 +LOOP 8 +mov GPR12, [STR0 + GPR1 * 8 + 256] +movaps FPR1, [STR0 + GPR1 * 8] +movaps FPR2, [STR0 + GPR1 * 8 + 16] +movaps FPR3, [STR0 + GPR1 * 8 + 32] +movaps FPR4, [STR0 + GPR1 * 8 + 48] + + diff --git a/bench/x86-64/peak.ptt b/bench/x86-64/peak.ptt new file mode 100644 index 000000000..c03e2c8d7 --- /dev/null +++ b/bench/x86-64/peak.ptt @@ -0,0 +1,49 @@ +STREAMS 2 +TYPE DOUBLE +FLOPS 2 +BYTES 16 +INC 8 +movaps FPR1, [SCALAR] +sub GPR2, 4 +sub STR0, 32 +sub STR1, 32 +mov GPR1, GPR2 +neg GPR1 +.align 16 +1: +movaps FPR2, [STR0 + GPR1 * 8 ] +addpd FPR2, FPR1 +mulpd FPR2, FPR1 +movaps FPR6, [STR0 + GPR1 * 8 ] +addpd FPR2, FPR1 +mulpd FPR2, FPR1 +pshufd FPR2, FPR1, 0x1 +#movaps [STR1 + GPR1 * 8], FPR2 +movaps FPR3, [STR0 + GPR1 * 8 + 16] +addpd FPR3, FPR1 +mulpd FPR3, FPR1 +movaps FPR7, [STR0 + GPR1 * 8 + 16 ] +addpd FPR3, FPR1 +mulpd FPR3, FPR1 +pshufd FPR3, FPR1, 0x1 +#movaps [STR1 + GPR1 * 8 + 16], FPR3 +movaps FPR4, [STR0 + GPR1 * 8 + 32] +addpd FPR4, FPR1 +mulpd FPR4, FPR1 +movaps FPR8, [STR0 + GPR1 * 8 + 32 ] +addpd FPR4, FPR1 +mulpd FPR4, FPR1 +pshufd FPR4, FPR1, 0x1 +#movaps [STR1 + GPR1 * 8 + 32], FPR4 +movaps FPR5, [STR0 + GPR1 * 8 + 48] +addpd FPR5, FPR1 +mulpd FPR5, FPR1 +movaps FPR9, [STR0 + GPR1 * 8 + 48 ] +addpd FPR5, FPR1 +mulpd FPR5, FPR1 +pshufd FPR5, FPR1, 0x1 +#movaps [STR1 + GPR1 * 8 + 48], FPR5 +add GPR1, 8 +js 1b + + diff --git a/bench/x86-64/peakflops.ptt b/bench/x86-64/peakflops.ptt new file mode 100644 index 000000000..94c769afe --- /dev/null +++ b/bench/x86-64/peakflops.ptt @@ -0,0 +1,37 @@ +STREAMS 2 +TYPE DOUBLE +FLOPS 2 +BYTES 16 +INC 8 +movaps FPR1, [SCALAR] +sub GPR2, 4 +sub STR0, 32 +sub STR1, 32 +mov GPR1, GPR2 +neg GPR1 +.align 32 +1: +movaps FPR2, [STR0 + GPR1 * 8 ] +addpd FPR2, FPR1 +mulpd FPR2, FPR1 +addpd FPR2, FPR1 +mulpd FPR2, FPR1 +movaps FPR3, [STR0 + GPR1 * 8 + 16] +add GPR1, 8 +addpd FPR3, FPR1 +mulpd FPR3, FPR1 +addpd FPR3, FPR1 +mulpd FPR3, FPR1 +movaps FPR4, [STR0 + GPR1 * 8 - 32] +addpd FPR4, FPR1 +mulpd FPR4, FPR1 +addpd FPR4, FPR1 +mulpd FPR4, FPR1 +movaps FPR5, [STR0 + GPR1 * 8 - 16] +addpd FPR5, FPR1 +mulpd FPR5, FPR1 +addpd FPR5, FPR1 +mulpd FPR5, FPR1 +js 1b + + diff --git a/bench/x86-64/store.ptt b/bench/x86-64/store.ptt new file mode 100644 index 000000000..4ef9ab987 --- /dev/null +++ b/bench/x86-64/store.ptt @@ -0,0 +1,15 @@ +STREAMS 1 +TYPE DOUBLE +FLOPS 0 +BYTES 8 +movaps FPR1, [SCALAR] +movaps FPR2, [SCALAR] +movaps FPR3, [SCALAR] +movaps FPR4, [SCALAR] +LOOP 8 +#mov GPR14, [STR0 + GPR1 * 8 + 256] +movaps [STR0 + GPR1 * 8] , FPR1 +movaps [STR0 + GPR1 * 8 + 16], FPR2 +movaps [STR0 + GPR1 * 8 + 32], FPR3 +movaps [STR0 + GPR1 * 8 + 48], FPR4 + diff --git a/bench/x86-64/store_mem.ptt b/bench/x86-64/store_mem.ptt new file mode 100644 index 000000000..0a0222d6a --- /dev/null +++ b/bench/x86-64/store_mem.ptt @@ -0,0 +1,14 @@ +STREAMS 1 +TYPE DOUBLE +FLOPS 0 +BYTES 8 +movaps FPR1, [SCALAR] +movaps FPR2, [SCALAR] +movaps FPR3, [SCALAR] +movaps FPR4, [SCALAR] +LOOP 8 +movntpd [STR0 + GPR1 * 8] , FPR1 +movntpd [STR0 + GPR1 * 8 + 16], FPR2 +movntpd [STR0 + GPR1 * 8 + 32], FPR3 +movntpd [STR0 + GPR1 * 8 + 48], FPR4 + diff --git a/bench/x86-64/stream.ptt b/bench/x86-64/stream.ptt new file mode 100644 index 000000000..7c84c3c2d --- /dev/null +++ b/bench/x86-64/stream.ptt @@ -0,0 +1,23 @@ +STREAMS 3 +TYPE DOUBLE +FLOPS 2 +BYTES 24 +movaps FPR5, [SCALAR] +LOOP 8 +movaps FPR1, [STR1 + GPR1*8] +movaps FPR2, [STR1 + GPR1*8+16] +movaps FPR3, [STR1 + GPR1*8+32] +movaps FPR4, [STR1 + GPR1*8+48] +mulpd FPR1, FPR5 +addpd FPR1, [STR2 + GPR1*8] +mulpd FPR2, FPR5 +addpd FPR2, [STR2 + GPR1*8+16] +mulpd FPR3, FPR5 +addpd FPR3, [STR2 + GPR1*8+32] +mulpd FPR4, FPR5 +addpd FPR4, [STR2 + GPR1*8+48] +movaps [STR0 + GPR1*8] , FPR1 +movaps [STR0 + GPR1*8+16], FPR2 +movaps [STR0 + GPR1*8+32], FPR3 +movaps [STR0 + GPR1*8+48], FPR4 + diff --git a/bench/x86-64/stream_mem.ptt b/bench/x86-64/stream_mem.ptt new file mode 100644 index 000000000..b8364cc0b --- /dev/null +++ b/bench/x86-64/stream_mem.ptt @@ -0,0 +1,11 @@ +STREAMS 3 +TYPE DOUBLE +FLOPS 2 +BYTES 24 +movaps FPR5, [SCALAR] +LOOP 2 +movaps FPR1, [STR2 + GPR1*8] +mulpd FPR1, FPR5 +addpd FPR1, [STR1 + GPR1*8] +movntpd [STR0 + GPR1*8], FPR1 + diff --git a/bench/x86-64/sum.ptt b/bench/x86-64/sum.ptt new file mode 100644 index 000000000..337484340 --- /dev/null +++ b/bench/x86-64/sum.ptt @@ -0,0 +1,23 @@ +STREAMS 1 +TYPE SINGLE +FLOPS 1 +BYTES 4 +xorps FPR1, FPR1 +movaps FPR2, FPR1 +movaps FPR3, FPR1 +movaps FPR4, FPR1 +movaps FPR5, FPR1 +movaps FPR6, FPR1 +movaps FPR7, FPR1 +movaps FPR8, FPR1 +LOOP 32 +addps FPR1, [STR0 + GPR1 * 4] +addps FPR2, [STR0 + GPR1 * 4 + 16] +addps FPR3, [STR0 + GPR1 * 4 + 32] +addps FPR4, [STR0 + GPR1 * 4 + 48] +addps FPR5, [STR0 + GPR1 * 4 + 64] +addps FPR6, [STR0 + GPR1 * 4 + 80] +addps FPR7, [STR0 + GPR1 * 4 + 96] +addps FPR8, [STR0 + GPR1 * 4 + 112] + + diff --git a/bench/x86-64/sum_avx.ptt b/bench/x86-64/sum_avx.ptt new file mode 100644 index 000000000..e2e8e40f2 --- /dev/null +++ b/bench/x86-64/sum_avx.ptt @@ -0,0 +1,14 @@ +STREAMS 1 +TYPE SINGLE +FLOPS 1 +BYTES 4 +vxorps ymm1, ymm1, ymm1 +vmovaps ymm2, ymm1 +vmovaps ymm3, ymm1 +vmovaps ymm4, ymm1 +LOOP 32 +vaddps ymm1, ymm1, [STR0 + GPR1*4] +vaddps ymm2, ymm2, [STR0 + GPR1*4+32] +vaddps ymm3, ymm3, [STR0 + GPR1*4+64] +vaddps ymm4, ymm4, [STR0 + GPR1*4+96] + diff --git a/bench/x86-64/sum_plain.ptt b/bench/x86-64/sum_plain.ptt new file mode 100644 index 000000000..23fe2376c --- /dev/null +++ b/bench/x86-64/sum_plain.ptt @@ -0,0 +1,15 @@ +STREAMS 1 +TYPE SINGLE +FLOPS 1 +BYTES 4 +xorps FPR1, FPR1 +xorps FPR2, FPR2 +xorps FPR3, FPR3 +xorps FPR4, FPR4 +LOOP 4 +addss FPR1, [STR0 + GPR1 * 4] +addss FPR2, [STR0 + GPR1 * 4 + 4] +addss FPR3, [STR0 + GPR1 * 4 + 8] +addss FPR4, [STR0 + GPR1 * 4 + 12] + + diff --git a/bench/x86-64/triad.ptt b/bench/x86-64/triad.ptt new file mode 100644 index 000000000..d521aa093 --- /dev/null +++ b/bench/x86-64/triad.ptt @@ -0,0 +1,22 @@ +STREAMS 4 +TYPE DOUBLE +FLOPS 2 +BYTES 32 +LOOP 8 +movaps FPR1, [STR1 + GPR1*8] +movaps FPR2, [STR1 + GPR1*8+16] +movaps FPR3, [STR1 + GPR1*8+32] +movaps FPR4, [STR1 + GPR1*8+48] +mulpd FPR1, [STR2 + GPR1*8] +addpd FPR1, [STR3 + GPR1*8] +mulpd FPR2, [STR2 + GPR1*8+16] +addpd FPR2, [STR3 + GPR1*8+16] +mulpd FPR3, [STR2 + GPR1*8+32] +addpd FPR3, [STR3 + GPR1*8+32] +mulpd FPR4, [STR2 + GPR1*8+48] +addpd FPR4, [STR3 + GPR1*8+48] +movaps [STR0 + GPR1*8], FPR1 +movaps [STR0 + GPR1*8+16], FPR2 +movaps [STR0 + GPR1*8+32], FPR3 +movaps [STR0 + GPR1*8+48], FPR4 + diff --git a/bench/x86-64/triad_mem.ptt b/bench/x86-64/triad_mem.ptt new file mode 100644 index 000000000..7c24748dd --- /dev/null +++ b/bench/x86-64/triad_mem.ptt @@ -0,0 +1,10 @@ +STREAMS 4 +TYPE DOUBLE +FLOPS 2 +BYTES 32 +LOOP 2 +movaps FPR1, [STR1 + GPR1*8] +mulpd FPR1, [STR2 + GPR1*8] +addpd FPR1, [STR3 + GPR1*8] +movntpd [STR0 + GPR1*8], FPR1 + diff --git a/bench/x86-64/update.ptt b/bench/x86-64/update.ptt new file mode 100644 index 000000000..ac1129b6b --- /dev/null +++ b/bench/x86-64/update.ptt @@ -0,0 +1,15 @@ +STREAMS 1 +TYPE DOUBLE +FLOPS 0 +BYTES 16 +LOOP 8 +movaps FPR1, [STR0 + GPR1 * 8] +movaps [STR0 + GPR1 * 8] , FPR1 +movaps FPR2, [STR0 + GPR1 * 8 + 16] +movaps FPR3, [STR0 + GPR1 * 8 + 32] +movaps FPR4, [STR0 + GPR1 * 8 + 48] +movaps [STR0 + GPR1 * 8 + 16], FPR2 +movaps [STR0 + GPR1 * 8 + 32], FPR3 +movaps [STR0 + GPR1 * 8 + 48], FPR4 + + diff --git a/bench/x86/copy.ptt b/bench/x86/copy.ptt new file mode 100644 index 000000000..111d38ba2 --- /dev/null +++ b/bench/x86/copy.ptt @@ -0,0 +1,18 @@ +STREAMS 2 +TYPE DOUBLE +FLOPS 0 +BYTES 16 +mov GPR6, ARG1 +mov GPR2, STR0 +mov GPR3, STR1 +LOOP 8 +movaps FPR1, [GPR2 + GPR1 * 8] +movaps FPR2, [GPR2 + GPR1 * 8 + 16] +movaps FPR3, [GPR2 + GPR1 * 8 + 32] +movaps FPR4, [GPR2 + GPR1 * 8 + 48] +movaps [GPR3 + GPR1 * 8] , FPR1 +movaps [GPR3 + GPR1 * 8 + 16], FPR2 +movaps [GPR3 + GPR1 * 8 + 32], FPR3 +movaps [GPR3 + GPR1 * 8 + 48], FPR4 + + diff --git a/bench/x86/load.ptt b/bench/x86/load.ptt new file mode 100644 index 000000000..cf001a46d --- /dev/null +++ b/bench/x86/load.ptt @@ -0,0 +1,13 @@ +STREAMS 1 +TYPE DOUBLE +FLOPS 0 +BYTES 8 +mov GPR6, ARG1 +mov GPR2, STR0 +LOOP 8 +movaps FPR1, [GPR2 + GPR1 * 8] +movaps FPR2, [GPR2 + GPR1 * 8 + 16] +movaps FPR3, [GPR2 + GPR1 * 8 + 32] +movaps FPR4, [GPR2 + GPR1 * 8 + 48] + + diff --git a/bench/x86/store.ptt b/bench/x86/store.ptt new file mode 100644 index 000000000..1cf15dac6 --- /dev/null +++ b/bench/x86/store.ptt @@ -0,0 +1,16 @@ +STREAMS 1 +TYPE DOUBLE +FLOPS 0 +BYTES 8 +movaps FPR1, [SCALAR] +movaps FPR2, [SCALAR] +movaps FPR3, [SCALAR] +movaps FPR4, [SCALAR] +mov GPR6, ARG1 +mov GPR2, STR0 +LOOP 8 +movaps [GPR2 + GPR1 * 8] , FPR1 +movaps [GPR2 + GPR1 * 8 + 16], FPR2 +movaps [GPR2 + GPR1 * 8 + 32], FPR3 +movaps [GPR2 + GPR1 * 8 + 48], FPR4 + diff --git a/bench/x86/stream.ptt b/bench/x86/stream.ptt new file mode 100644 index 000000000..bab4ecb7c --- /dev/null +++ b/bench/x86/stream.ptt @@ -0,0 +1,27 @@ +STREAMS 3 +TYPE DOUBLE +FLOPS 2 +BYTES 24 +movaps FPR5, [SCALAR] +mov GPR6, ARG1 +mov GPR2, STR0 +mov GPR3, STR1 +mov GPR4, STR2 +LOOP 8 +movaps FPR1, [GPR3 + GPR1*8] +movaps FPR2, [GPR3 + GPR1*8+16] +movaps FPR3, [GPR3 + GPR1*8+32] +movaps FPR4, [GPR3 + GPR1*8+48] +mulpd FPR1, FPR5 +addpd FPR1, [GPR4 + GPR1*8] +mulpd FPR2, FPR5 +addpd FPR2, [GPR4 + GPR1*8+16] +mulpd FPR3, FPR5 +addpd FPR3, [GPR4 + GPR1*8+32] +mulpd FPR4, FPR5 +addpd FPR4, [GPR4 + GPR1*8+48] +movaps [GPR2 + GPR1*8] , FPR1 +movaps [GPR2 + GPR1*8+16], FPR2 +movaps [GPR2 + GPR1*8+32], FPR3 +movaps [GPR2 + GPR1*8+48], FPR4 + diff --git a/config.mk b/config.mk new file mode 100644 index 000000000..697dac4e6 --- /dev/null +++ b/config.mk @@ -0,0 +1,57 @@ +# Please have a look in INSTALL and the WIKI for details on +# configuration options setup steps. +# supported: GCC, MIC (ICC) +COMPILER = GCC#NO SPACE + +# Define the color of the likwid-pin output +# Can be NONE, BLACK, RED, GREEN, YELLOW, BLUE, +# MAGENTA, CYAN or WHITE +COLOR = BLUE#NO SPACE + +# Path were to install likwid +PREFIX = /usr/local#NO SPACE +MANPREFIX = $(PREFIX)/man#NO SPACE + +# For the daemon based secure msr/pci access configure +# the absolute path to the msr daemon executable. +# $(PREFIX)/bin/likwid-accessD +ACCESSDAEMON = $(PREFIX)/bin/likwid-accessD#NO SPACE + +# Build the accessDaemon. Have a look in the WIKI for details. +BUILDDAEMON = false#NO SPACE + +# Set the default mode for MSR access. +# This can usually be overriden on the commandline. +# Valid values are: direct, accessdaemon +ACCESSMODE = direct#NO SPACE + +# Change to true to a build shared library instead of a static one +SHARED_LIBRARY = true#NO SPACE + +# Build Fortran90 module interface for marker API. Adopt Fortran compiler +# in ./make/include_.mk if necessary. Default: ifort . +FORTRAN_INTERFACE = false#NO SPACE + +# Instrument likwid-bench for use with likwid-perfctr +INSTRUMENT_BENCH = true#NO SPACE + +# Instrument accesses to msr registers at likwid-perfctr +INSTRUMENT_COUNTER = false#NO SPACE + +# Use Portable Hardware Locality (hwloc) instead of CPUID +USE_HWLOC = true#NO SPACE + +# Usually you do not need to edit below +MAX_NUM_THREADS = 263 +MAX_NUM_NODES = 4 +HASH_TABLE_SIZE = 20 +CFG_FILE_PATH = /etc/likwid.cfg + +# Versioning Information +VERSION = 3 +RELEASE = 1 +DATE = 5.2.2014 + +LIBLIKWIDPIN = $(abspath $(PREFIX)/lib/liblikwidpin.so) +LIKWIDFILTERPATH = $(abspath $(PREFIX)/share/likwid) + diff --git a/doc/likwid-features.1 b/doc/likwid-features.1 new file mode 100644 index 000000000..4b7e2ced9 --- /dev/null +++ b/doc/likwid-features.1 @@ -0,0 +1,58 @@ +.TH LIKWID-FEATURES 1 likwid\- +.SH NAME +likwid-features \- print and toggle the flags of the MSR_IA32_MISC_ENABLE model specific register +.SH SYNOPSIS +.B likwid-features +.RB [ \-vh ] +.RB [ \-t +.IR coreId ] +.RB [ \-su +.IR prefetcher_tag ] +.SH DESCRIPTION +.B likwid-features +is a command line application to print the flags in the model +specific register (MSR) MSR_IA32_MISC_ENABLE on Intel x86 processors. On Core2 processors +it can be used to toggle the hardware prefetch flags. It does not work on AMD processors. +For a documentation what flags are supported on which processor refer to the Intel +Software Developer's Manual Volume 3B, Table B.2. The MSR are set individually for every core. +The following hardware prefetchers can be toggled: +.IP \[bu] +.B HW_PREFETCHER: +Hardware prefetcher. +.IP \[bu] +.B CL_PREFETCHER: +Adjacent cache line prefetcher. +.IP \[bu] +.B DCU_PREFETCHER: +When the DCU prefetcher detects multiple loads from the same line done within a +time limit, the DCU prefetcher assumes the next line will be required. The next +line is prefetched in to the L1 data cache from memory or L2. +.IP \[bu] +.B IP_PREFETCHER: +The IP prefetcher is an L1 data cache prefetcher. The IP prefetcher looks for +sequential load history to determine whether to prefetch the next expected data +into the L1 cache from memory or L2. + +.SH OPTIONS +.TP +.B \-\^v +prints version information to standard output, then exits. +.TP +.B \-\^h +prints a help message to standard output, then exits. +.TP +.B \-\^t " coreId" +set on which processor core the MSR should be read +.TP +.B \-\^u " HW_PREFETCHER | CL_PREFETCHER | DCU_PREFETCHER | IP_PREFETCHER" +specify which prefetcher to unset +.TP +.B \-\^s " HW_PREFETCHER | CL_PREFETCHER | DCU_PREFETCHER | IP_PREFETCHER" +specify which prefetcher to set + +.SH AUTHOR +Written by Jan Treibig . +.SH BUGS +Report Bugs on . +.SH "SEE ALSO" +likwid-topology(1), likwid-perfCtr(1), likwid-pin(1), diff --git a/doc/likwid-perfctr.1 b/doc/likwid-perfctr.1 new file mode 100644 index 000000000..f6d0c527e --- /dev/null +++ b/doc/likwid-perfctr.1 @@ -0,0 +1,196 @@ +.TH LIKWID-PERFCTR 1 likwid\- +.SH NAME +likwid-perfctr \- configure and read out hardware performance counters on x86 cpus +.SH SYNOPSIS +.B likwid-perfctr +.RB [\-vhHVmaiCst] +.RB [ \-c +.IR core_list ] +.RB [ \-g +.IR performance_group +or +.IR performance_event_string ] +.RB [ \-d +.IR frequency +.SH DESCRIPTION +.B likwid-perfctr +is a lightweight command line application to configure and read out hardware performance monitoring data +on supported x86 processors. It can measure either as wrapper without changing the measured application +or with marker API functions inside the code, which will turn on and off the counters. There are preconfigured +groups with useful event sets and derived metrics. Additonally arbitrary events can be measured with +custom event sets. The marker API can measure mulitple named regions. Results are accumulated on multiple calls. +The following x86 processor's are supported: +.IP \[bu] +.B Intel Core 2: +all variants. Counters: +.I PMC0, PMC1, FIXC0, FIXC1, FIXC2 +.IP \[bu] +.B Intel Nehalem: +all variants. Counters: +.I PMC0, PMC1, PMC2, PMC3, UPMC0 - UPMC7, FIXC0, FIXC1, FIXC2 +.IP \[bu] +.B Intel Nehalem EX: +all variants, no uncore for the moment. Counters: +.I PMC0, PMC1, PMC2, PMC3, FIXC0, FIXC1, FIXC2 +.IP \[bu] +.B Intel Westmere: +all variants, Counters: +.I PMC0, PMC1, PMC2, PMC3, UPMC0 - UPMC7, FIXC0, FIXC1, FIXC2 +.IP \[bu] +.B Intel Sandy Bridge: +all variants, no uncore at the moment, experimental support, Counters: +.I PMC0, PMC1, PMC2, PMC3, FIXC0, FIXC1, FIXC2 +.IP \[bu] +.B Intel Pentium M: +Banias and Dothan variants. Counters: +.I PMC0, PMC1 +.IP \[bu] +.B Intel P6: +Tested on P3. +.IP \[bu] +.B AMD K8: +all variants. Counters: +.I PMC0, PMC1, PMC2, PMC3 +.IP \[bu] +.B AMD K10: +Barcelona, Shanghai, Istanbul, MagnyCours based processors. Counters: +.I PMC0, PMC1, PMC2, PMC3 + +.SH OPTIONS +.TP +.B \-\^v +prints version information to standard output, then exits. +.TP +.B \-\^h +prints a help message to standard output, then exits. +.TP +.B \-\^H +prints group help message (use together with -g switch). +.TP +.B \-\^V +verbose output during execution for debugging. +.TP +.B \-\^m +run in marker API mode +.TP +.B \-\^a +print available performance groups for current processor, then exit. +.TP +.B \-\^e +print available counters and performance events of current processor. +.TP +.B \-\^o +store all ouput to a file instead of stdout. For the filename the following placeholders are supported: +%j for PBS_JOBID, %r for MPI RANK (only Intel MPI at the moment), %h hostname and %p for process pid. +The placeholders must be separated by underscore as, e.g., -o test_%h_%p. You must specify a suffix to +the filename. For txt the output is printed as is to the file. Other suffixes trigger a filter on the output. +Available filters are csv (comma separated values) and xml at the moment. +.TP +.B \-\^i +print cpuid information about processor and on Intel Performance Monitoring features, then exit. +.TP +.B \-\^c " processor_list" +specify a numerical list of processors. The list may contain multiple +items, separated by comma, and ranges. For example 0,3,9-11. +.TP +.B \-\^C " processor_list" +specify a numerical list of processors. The list may contain multiple +items, separated by comma, and ranges. For example 0,3,9-11. This variant will +also pin the threads to the cores. Also logical numberings can be used. +.TP +.B \-\^g " performance group or performance event set string" +specify which performance group to measure. This can be one of the tags output with the -a flag. +Also a custom event set can be specified by a comma separated list of events. Each event has the format +eventId:register with the the register being one of a architecture supported performance counter registers. +.TP +.B \-\^d " frequency of measurements in seconds" +timeline mode for time resolved measurements. The output has the format: +.TP +.B ... + +.SH EXAMPLE +Because +.B likwid-perfctr +measures on processors and not single applications it is necessary to ensure +that processes and threads are pinned to dedicated resources. You can either pin the application yourself +or the builtin pin functionality. +.IP 1. 4 +As wrapper with performance group: +.TP +.B likwid-perfctr -C 0-2 -g TLB ./cacheBench -n 2 -l 1048576 -i 100 -t Stream +.PP +The parent process is pinned to processor 0, Thread 0 to processor 1 and Thread 1 to processor 2. +.IP 2. 4 +As wrapper with custom event set on AMD: +.TP +.B likwid-perfctr -C 0-4 -g INSTRUCTIONS_RETIRED_SSE:PMC0,CPU_CLOCKS_UNHALTED:PMC3 ./cacheBench +.PP +It is specified that the event +.B INSTRUCTIONS_RETIRED_SSE +is measured on counter +.B PMC0 +and the event +.B CPU_CLOCKS_UNHALTED +on counter +.B PMC3. +It is possible calculate the runtime of all threads based on the +.B CPU_CLOCKS_UNHALTED +event. If you want this you have to include this event in your custom event string as shown above. + +.IP 3. 4 +As wrapper with custom event set on Intel: +.TP +.B likwid-perfctr -C 0 -g INSTR_RETIRED_ANY:FIXC0,CPU_CLK_UNHALTED_CORE:FIXC1,UNC_L3_LINES_IN_ANY:UPMC0 ./stream-icc +.PP +On Intel processors fixed events are measured on dedicated counters. These are +.B INSTR_RETIRED_ANY +and +.B CPU_CLK_UNHALTED_CORE. +If you configure these fixed counters, +.B likwid-perfctr +will calculate the runtime and CPI metrics for your run. + +.IP 4. 4 +Using the marker API to measure only parts of your code (this can be used both with groups or custom event sets): +.TP +.B likwid-perfctr -m -C 0-4 -g INSTRUCTIONS_RETIRED_SSE:PMC0,CPU_CLOCKS_UNHALTED:PMC3 ./cacheBench +.PP +You have to link you code against liblikwid.a and use the marker API calls. +The following code snippet shows the necessary calls: + +.nf +#include + +/* only one thread calls init */ + if (threadId == 0) + { + likwid_markerInit(); + } + BARRIER; + likwid_markerStartRegion("Benchmark"); + /* your code to be measured is here */ + + likwid_markerStopRegion("Benchmark"); + BARRIER; + /* again only one thread can close the markers */ + if (threadId == 0) + { + likwid_markerClose(); + } +.fi + +.IP 5. 4 +Using likwid in timeline mode: +.TP +.B likwid-perfctr -c 0-3 -g FLOPS_DP -d 300ms ./cacheBench > out.txt +.PP +This will read out the counters every 300ms on physical cores 0-3 and write the results to out.txt. +For timeline mode there is a frontend application likwid-scope, which enables live plotting of selected events. +For more code examples have a look at the likwid WIKI pages. + +.SH AUTHOR +Written by Jan Treibig . +.SH BUGS +Report Bugs on . +.SH SEE ALSO +likwid-topology(1), likwid-features(1), likwid-pin(1), likwid-bench(1) diff --git a/doc/likwid-pin.1 b/doc/likwid-pin.1 new file mode 100644 index 000000000..9d95365a4 --- /dev/null +++ b/doc/likwid-pin.1 @@ -0,0 +1,120 @@ +.TH LIKWID-PIN 1 likwid\-VERSION +.SH NAME +likwid-pin \- pin a sequential or threaded application to dedicated processors +.SH SYNOPSIS +.B likwid-pin +.RB [\-vh] +.RB [ \-c +.IR corelist +.RB [ \-s +.IR skip_mask ] +.RB [ \-S +.IR Sweep memory before run] +.RB [ \-p] +.RB [ \-q] +.RB [ \-i] +.SH DESCRIPTION +.B likwid-pin +is a command line application to pin a sequential or multi threaded +application to dedicated processors. It can be used as replacement for taskset. +Opposite to taskset no affinity mask but single processors are specified. +For multi threaded applications based on the pthread library the +.I pthread_create +library call is overloaded through a LD_PRELOAD and each created thread is pinned +to a dedicated processor as specified in +.I core_list . +.PP +Per default every generated thread is pinned to the core in the order of calls +to pthread_create. It is possible to skip single threads. +.PP +For OpenMP implementations gcc and icc compilers are explicitly supported. Others may also work +.B likwid-pin +sets the environment variable OMP_NUM_THREADS for you if not already present. +It will set as many threads as present in the pin expression. Be aware that +with pthreads the parent thread is always pinned. If you create for example 4 +threads with pthread_create and do not use the parent process as worker you +still have to provide num_threads+1 processor ids. +.PP +.B likwid-pin +supports different numberings for pinning. Per default physical numbering of +the cores is used. This is the numbering also likwid-topology reports. But +also logical numbering inside the node or the sockets can be used. If using +with a N (e.g. -c N:0-6) the cores are logical numbered over the whole node. +Physical cores come first. If a system e.g. has 8 cores with 16 SMT threads +with -c N:0-7 you get all physical cores. If you specify -c N:0-15 you get all +physical cores and all SMT threads. With S you can specify logical numberings +inside sockets, again physical cores come first. You can mix different domains +with a @. -c S0:0-3@S2:2-3 you pin thread 0-3 to logical cores 0-3 on socket 0 +and threads 4-6 on logical cores 2-3 on socket 2. +.PP +For applications where first touch policy on numa systems cannot be employed +.B likwid-pin +can be used to turn on interleave memory placement. This can significantly +speed up the performance of memory bound multi threaded codes. All numa nodes +the user pinned threads to are used for interleaving. + +.SH OPTIONS +.TP +.B \-\^v +prints version information to standard output, then exits. +.TP +.B \-\^h +prints a help message to standard output, then exits. +.TP +.B \-\^c " processor_list OR thread expression OR scatter policy " +specify a numerical list of processors. The list may contain multiple +items, separated by comma, and ranges. For example 0,3,9-11. You can also use +logical numberings, either within a node (N), a socket (S) or a numa domain (M). +likwid-pin also supports logical pinning within a cpuset with a L prefix. If you ommit this option +likwid-pin will pin the threads to the processors on the node with physical cores first. +See below for details on using a thread expression or scatter policy +.TP +.B \-\^s " skip_mask +Specify skip mask as HEX number. For each set bit the corresponding thread is skipped. +.TP +.B \-\^S " enable memory sweeper +All ccNUMA memory domains belonging to the specified threadlist will be cleaned before the run. Can solve file buffer cache problems on Linux. +.TP +.B \-\^p +prints the available thread domains for logical pinning +.TP +.B \-\^i +set numa memory policy to interleave involving all numa nodes involved in pinning +.TP +.B \-\^q +silent execution without output + + +.SH EXAMPLE +.IP 1. 4 +For standard pthread application: +.TP +.B likwid-pin -c 0,2,4-6 ./myApp +.PP +The parent process is pinned to processor 0. Thread 0 to processor 2, thread +1 to processor 4, thread 2 to processor 5 and thread 3 to processor 6. If more threads +are created than specified in the processor list, these threads are pinned to processor 0 +as fallback. +.IP 2. 4 +For gcc OpenMP as many ids must be specified in processor list as there are threads: +.TP +.B OMP_NUM_THREADS=4; likwid-pin -c 0,2,1,3 ./myApp +.IP 3. 4 +For Intel icc OpenMP the flag +.B \-\^t +.I intel +must be set. +.TP +.B OMP_NUM_THREADS=4; likwid-pin -t intel -c S0:0,1@S1:0,1 ./myApp +.IP 4. 4 +Full control over the pinning can be achieved by specifying a skip mask. +For example above case for Intel OpenMP can also be achieved with: +.TP +.B OMP_NUM_THREADS=4; likwid-pin -s 0x1 -c 0,2,1,3 ./myApp + +.SH AUTHOR +Written by Jan Treibig . +.SH BUGS +Report Bugs on . +.SH "SEE ALSO" +taskset(1), likwid-perfctr(1), likwid-features(1), likwid-topology(1), diff --git a/doc/likwid-powermeter.1 b/doc/likwid-powermeter.1 new file mode 100644 index 000000000..a05d52852 --- /dev/null +++ b/doc/likwid-powermeter.1 @@ -0,0 +1,41 @@ +.TH LIKWID-POWERMETER 1 likwid\- +.SH NAME +likwid-powermeter \- A tool to print Power and Clocking information on Intel CPUS +.SH SYNOPSIS +.B likwid-powermeter +.RB [ \-vh ] +.RB [ \-c +.IR socketId ] +.RB [ \-s +.IR duration in seconds ] +.SH DESCRIPTION +.B likwid-powermeter +is a command line application to get the Energy comsumption on Intel RAPL capable processors. Currently +only Intel SandyBridge is supported. It also prints information about TDP and Turbo Mode steps supported. +The Turbo Mode information works on all Turbo mode enabled Intel processors. The tool can be either used +in stethoscope mode for a specified duration or as a wrapper to your application measuring your complete +run. RAPL works on a per package (socket) base. +Please note that the RAPL counters are also accessible as normal events withing likwid-perfctr. +.SH OPTIONS +.TP +.B \-\^v +prints version information to standard output, then exits. +.TP +.B \-\^h +prints a help message to standard output, then exits. +.TP +.B \-\^c " socketId" +set on which socket the RAPL interface is accessed. +.TP +.B \-\^p +prints out information about dynamic clocks and CPI information on the socket measured. +.TP +.B \-\^i +prints out information TDP and Turbo mode steps + +.SH AUTHOR +Written by Jan Treibig . +.SH BUGS +Report Bugs on . +.SH "SEE ALSO" +likwid-topology(1), likwid-perfCtr(1), likwid-pin(1), diff --git a/doc/likwid-topology.1 b/doc/likwid-topology.1 new file mode 100644 index 000000000..911943156 --- /dev/null +++ b/doc/likwid-topology.1 @@ -0,0 +1,36 @@ +.TH LIKWID-TOPOLOGY 1 likwid\- +.SH NAME +likwid-topology \- print thread and cache topology +.SH SYNOPSIS +.B likwid-topology +.RB [\-hvgcC] +.SH DESCRIPTION +.B likwid-topology +is a command line application to print the thread and cache +toppology on multicore x86 processors. Used with mono spaced fonts it can +draw the processor topology of a machine in ascii art. Beyond topology +likwid-topology determines the clock of a processor and prints detailed +informations about the caches hierarchy. +.SH OPTIONS +.TP +.B \-v +prints version information to standard output, then exits. +.TP +.B \-h +prints a help message to standard output, then exits. +.TP +.B \-g +prints topology information in ascii art. Best viewed with monospaced font. +.TP +.B \-c +prints detailed informations about cache hierarchy +.TP +.B \-C +measures and output the processor clock. This involves a longer runtime of likwid-topology. + +.SH AUTHOR +Written by Jan Treibig . +.SH BUGS +Report Bugs on . +.SH "SEE ALSO" +likwid-perfCtr(1), likwid-features(1), likwid-pin(1), diff --git a/ext/hwloc/AUTHORS b/ext/hwloc/AUTHORS new file mode 100644 index 000000000..837b27f2c --- /dev/null +++ b/ext/hwloc/AUTHORS @@ -0,0 +1,8 @@ +Cédric Augonnet +Jérôme Clet-Ortega +Ludovic Courtès +Brice Goglin +Nathalie Furmento +Samuel Thibault +Jeff Squyres +Alexey Kardashevskiy diff --git a/ext/hwloc/COPYING b/ext/hwloc/COPYING new file mode 100644 index 000000000..32128c7f2 --- /dev/null +++ b/ext/hwloc/COPYING @@ -0,0 +1,28 @@ +Copyright © 2009 CNRS +Copyright © 2009 inria. All rights reserved. +Copyright © 2009 Université Bordeaux 1 +Copyright © 2009 Cisco Systems, Inc. All rights reserved. +Copyright © 2012 Blue Brain Project, EPFL. All rights reserved. +See COPYING in top-level directory. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions +are met: +1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. +2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. +3. The name of the author may not be used to endorse or promote products + derived from this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR +IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES +OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. +IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, +INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT +NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF +THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. diff --git a/ext/hwloc/Makefile b/ext/hwloc/Makefile new file mode 100644 index 000000000..45920e8ae --- /dev/null +++ b/ext/hwloc/Makefile @@ -0,0 +1,53 @@ +SRC_DIRS = ./src +MAKE_DIR = ../../make + +#DO NOT EDIT BELOW + +include ../../config.mk +include $(MAKE_DIR)/include_$(COMPILER).mk + +CFLAGS = -O2 -Wall -fPIC +INCLUDES = -I./include +DEFINES = +LIBS = -lm -Wl,-E +LFLAGS = +Q ?= @ + +#CONFIGURE BUILD SYSTEM +BUILD_DIR = ./$(COMPILER) + +VPATH = $(SRC_DIRS) +FILES = $(notdir $(foreach dir,$(SRC_DIRS),$(wildcard $(dir)/*.c))) +OBJ = $(patsubst %.c, $(BUILD_DIR)/%.o, $(FILES)) + +LIBHWLOC = libhwloc.a + +CPPFLAGS := $(CPPFLAGS) $(DEFINES) $(INCLUDES) + +all: $(BUILD_DIR) $(OBJ) $(LIBHWLOC) + +$(BUILD_DIR): + @mkdir $(BUILD_DIR) + +$(LIBHWLOC): + $(Q)${AR} -cq $(LIBHWLOC) $(OBJ) + +#PATTERN RULES +$(BUILD_DIR)/%.o: %.c + ${Q}$(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@ + ${Q}$(CC) $(CPPFLAGS) -MT $(@:.d=.o) -MM $< > $(BUILD_DIR)/$*.d + +ifeq ($(findstring $(MAKECMDGOALS),clean),) +-include $(OBJ:.o=.d) +endif + +.PHONY: clean distclean + +clean: + @rm -rf $(BUILD_DIR) $(LIBHWLOC) + +distclean: clean + @rm -f $(TARGET) + + + diff --git a/ext/hwloc/include/hwloc.h b/ext/hwloc/include/hwloc.h new file mode 100644 index 000000000..c4fda856f --- /dev/null +++ b/ext/hwloc/include/hwloc.h @@ -0,0 +1,2258 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2013 Inria. All rights reserved. + * Copyright © 2009-2012 Université Bordeaux 1 + * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +/*===================================================================== + * PLEASE GO READ THE DOCUMENTATION! + * ------------------------------------------------ + * $tarball_directory/doc/doxygen-doc/ + * or + * http://www.open-mpi.org/projects/hwloc/doc/ + *===================================================================== + * + * FAIR WARNING: Do NOT expect to be able to figure out all the + * subtleties of hwloc by simply reading function prototypes and + * constant descrptions here in this file. + * + * Hwloc has wonderful documentation in both PDF and HTML formats for + * your reading pleasure. The formal documentation explains a LOT of + * hwloc-specific concepts, provides definitions, and discusses the + * "big picture" for many of the things that you'll find here in this + * header file. + * + * The PDF/HTML documentation was generated via Doxygen; much of what + * you'll see in there is also here in this file. BUT THERE IS A LOT + * THAT IS IN THE PDF/HTML THAT IS ***NOT*** IN hwloc.h! + * + * There are entire paragraph-length descriptions, discussions, and + * pretty prictures to explain subtle corner cases, provide concrete + * examples, etc. + * + * Please, go read the documentation. :-) + * + *=====================================================================*/ + +/** \file + * \brief The hwloc API. + * + * See hwloc/bitmap.h for bitmap specific macros. + * See hwloc/helper.h for high-level topology traversal helpers. + * See hwloc/inlines.h for the actual inline code of some functions below. + */ + +#ifndef HWLOC_H +#define HWLOC_H + +#include +#include +#include +#include +#include + +/* + * Symbol transforms + */ +#include + +/* + * Bitmap definitions + */ + +#include + + +#ifdef __cplusplus +extern "C" { +#endif + + +/** \defgroup hwlocality_api_version API version + * @{ + */ + +/** \brief Indicate at build time which hwloc API version is being used. */ +#define HWLOC_API_VERSION 0x00010800 + +/** \brief Indicate at runtime which hwloc API version was used at build time. */ +HWLOC_DECLSPEC unsigned hwloc_get_api_version(void); + +/** \brief Current component and plugin ABI version (see hwloc/plugins.h) */ +#define HWLOC_COMPONENT_ABI 3 + +/** @} */ + + + +/** \defgroup hwlocality_object_sets Object Sets (hwloc_cpuset_t and hwloc_nodeset_t) + * + * Hwloc uses bitmaps to represent two distinct kinds of object sets: + * CPU sets (::hwloc_cpuset_t) and NUMA node sets (::hwloc_nodeset_t). + * These types are both typedefs to a common back end type + * (::hwloc_bitmap_t), and therefore all the hwloc bitmap functions + * are applicable to both ::hwloc_cpuset_t and ::hwloc_nodeset_t (see + * \ref hwlocality_bitmap). + * + * The rationale for having two different types is that even though + * the actions one wants to perform on these types are the same (e.g., + * enable and disable individual items in the set/mask), they're used + * in very different contexts: one for specifying which processors to + * use and one for specifying which NUMA nodes to use. Hence, the + * name difference is really just to reflect the intent of where the + * type is used. + * + * @{ + */ + +/** \brief A CPU set is a bitmap whose bits are set according to CPU + * physical OS indexes. + * + * It may be consulted and modified with the bitmap API as any + * ::hwloc_bitmap_t (see hwloc/bitmap.h). + */ +typedef hwloc_bitmap_t hwloc_cpuset_t; +/** \brief A non-modifiable ::hwloc_cpuset_t. */ +typedef hwloc_const_bitmap_t hwloc_const_cpuset_t; + +/** \brief A node set is a bitmap whose bits are set according to NUMA + * memory node physical OS indexes. + * + * It may be consulted and modified with the bitmap API as any + * ::hwloc_bitmap_t (see hwloc/bitmap.h). + * + * When binding memory on a system without any NUMA node + * (when the whole memory is considered as a single memory bank), + * the nodeset may be either empty (no memory selected) + * or full (whole system memory selected). + * + * See also \ref hwlocality_helper_nodeset_convert. + */ +typedef hwloc_bitmap_t hwloc_nodeset_t; +/** \brief A non-modifiable ::hwloc_nodeset_t. + */ +typedef hwloc_const_bitmap_t hwloc_const_nodeset_t; + +/** @} */ + + + +/** \defgroup hwlocality_object_types Object Types + * @{ + */ + +/** \brief Type of topology object. + * + * \note Do not rely on the ordering or completeness of the values as new ones + * may be defined in the future! If you need to compare types, use + * hwloc_compare_types() instead. + */ +typedef enum { + /* *************************************************************** + WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING + + If new enum values are added here, you MUST also go update the + obj_type_order[] and obj_order_type[] arrays in src/topology.c. + + WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING + *************************************************************** */ + + HWLOC_OBJ_SYSTEM, /**< \brief Whole system (may be a cluster of machines). + * The whole system that is accessible to hwloc. + * That may comprise several machines in SSI systems + * like Kerrighed. + */ + HWLOC_OBJ_MACHINE, /**< \brief Machine. + * The typical root object type. + * A set of processors and memory with cache + * coherency. + */ + HWLOC_OBJ_NODE, /**< \brief NUMA node. + * A set of processors around memory which the + * processors can directly access. + */ + HWLOC_OBJ_SOCKET, /**< \brief Socket, physical package, or chip. + * In the physical meaning, i.e. that you can add + * or remove physically. + */ + HWLOC_OBJ_CACHE, /**< \brief Cache. + * Can be L1i, L1d, L2, L3, ... + */ + HWLOC_OBJ_CORE, /**< \brief Core. + * A computation unit (may be shared by several + * logical processors). + */ + HWLOC_OBJ_PU, /**< \brief Processing Unit, or (Logical) Processor. + * An execution unit (may share a core with some + * other logical processors, e.g. in the case of + * an SMT core). + * + * Objects of this kind are always reported and can + * thus be used as fallback when others are not. + */ + + HWLOC_OBJ_GROUP, /**< \brief Group objects. + * Objects which do not fit in the above but are + * detected by hwloc and are useful to take into + * account for affinity. For instance, some operating systems + * expose their arbitrary processors aggregation this + * way. And hwloc may insert such objects to group + * NUMA nodes according to their distances. + * + * These objects are ignored when they do not bring + * any structure. + */ + + HWLOC_OBJ_MISC, /**< \brief Miscellaneous objects. + * Objects without particular meaning, that can e.g. be + * added by the application for its own use. + */ + + HWLOC_OBJ_BRIDGE, /**< \brief Bridge. + * Any bridge that connects the host or an I/O bus, + * to another I/O bus. + * Bridge objects have neither CPU sets nor node sets. + * They are not added to the topology unless I/O discovery + * is enabled with hwloc_topology_set_flags(). + */ + HWLOC_OBJ_PCI_DEVICE, /**< \brief PCI device. + * These objects have neither CPU sets nor node sets. + * They are not added to the topology unless I/O discovery + * is enabled with hwloc_topology_set_flags(). + */ + HWLOC_OBJ_OS_DEVICE, /**< \brief Operating system device. + * These objects have neither CPU sets nor node sets. + * They are not added to the topology unless I/O discovery + * is enabled with hwloc_topology_set_flags(). + */ + + HWLOC_OBJ_TYPE_MAX /**< \private Sentinel value */ + + /* *************************************************************** + WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING + + If new enum values are added here, you MUST also go update the + obj_type_order[] and obj_order_type[] arrays in src/topology.c. + + WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING + *************************************************************** */ +} hwloc_obj_type_t; + +/** \brief Cache type. */ +typedef enum hwloc_obj_cache_type_e { + HWLOC_OBJ_CACHE_UNIFIED, /**< \brief Unified cache. */ + HWLOC_OBJ_CACHE_DATA, /**< \brief Data cache. */ + HWLOC_OBJ_CACHE_INSTRUCTION /**< \brief Instruction cache. + * Only used when the HWLOC_TOPOLOGY_FLAG_ICACHES topology flag is set. */ +} hwloc_obj_cache_type_t; + +/** \brief Type of one side (upstream or downstream) of an I/O bridge. */ +typedef enum hwloc_obj_bridge_type_e { + HWLOC_OBJ_BRIDGE_HOST, /**< \brief Host-side of a bridge, only possible upstream. */ + HWLOC_OBJ_BRIDGE_PCI /**< \brief PCI-side of a bridge. */ +} hwloc_obj_bridge_type_t; + +/** \brief Type of a OS device. */ +typedef enum hwloc_obj_osdev_type_e { + HWLOC_OBJ_OSDEV_BLOCK, /**< \brief Operating system block device. + * For instance "sda" on Linux. */ + HWLOC_OBJ_OSDEV_GPU, /**< \brief Operating system GPU device. + * For instance ":0.0" for a GL display, + * "card0" for a Linux DRM device. */ + HWLOC_OBJ_OSDEV_NETWORK, /**< \brief Operating system network device. + * For instance the "eth0" interface on Linux. */ + HWLOC_OBJ_OSDEV_OPENFABRICS, /**< \brief Operating system openfabrics device. + * For instance the "mlx4_0" InfiniBand HCA device on Linux. */ + HWLOC_OBJ_OSDEV_DMA, /**< \brief Operating system dma engine device. + * For instance the "dma0chan0" DMA channel on Linux. */ + HWLOC_OBJ_OSDEV_COPROC /**< \brief Operating system co-processor device. + * For instance "mic0" for a Xeon Phi (MIC) on Linux, + * "opencl0d0" for a OpenCL device, + * "cuda0" for a CUDA device. */ +} hwloc_obj_osdev_type_t; + +/** \brief Compare the depth of two object types + * + * Types shouldn't be compared as they are, since newer ones may be added in + * the future. This function returns less than, equal to, or greater than zero + * respectively if \p type1 objects usually include \p type2 objects, are the + * same as \p type2 objects, or are included in \p type2 objects. If the types + * can not be compared (because neither is usually contained in the other), + * HWLOC_TYPE_UNORDERED is returned. Object types containing CPUs can always + * be compared (usually, a system contains machines which contain nodes which + * contain sockets which contain caches, which contain cores, which contain + * processors). + * + * \note HWLOC_OBJ_PU will always be the deepest. + * \note This does not mean that the actual topology will respect that order: + * e.g. as of today cores may also contain caches, and sockets may also contain + * nodes. This is thus just to be seen as a fallback comparison method. + */ +HWLOC_DECLSPEC int hwloc_compare_types (hwloc_obj_type_t type1, hwloc_obj_type_t type2) __hwloc_attribute_const; + +enum hwloc_compare_types_e { + HWLOC_TYPE_UNORDERED = INT_MAX /**< \brief Value returned by hwloc_compare_types when types can not be compared. \hideinitializer */ +}; + +/** @} */ + + + +/** \defgroup hwlocality_objects Object Structure and Attributes + * @{ + */ + +union hwloc_obj_attr_u; + +/** \brief Object memory */ +struct hwloc_obj_memory_s { + hwloc_uint64_t total_memory; /**< \brief Total memory (in bytes) in this object and its children */ + hwloc_uint64_t local_memory; /**< \brief Local memory (in bytes) */ + + /** \brief Size of array \p page_types */ + unsigned page_types_len; + /** \brief Array of local memory page types, \c NULL if no local memory and \p page_types is 0. + * + * The array is sorted by increasing \p size fields. + * It contains \p page_types_len slots. + */ + struct hwloc_obj_memory_page_type_s { + hwloc_uint64_t size; /**< \brief Size of pages */ + hwloc_uint64_t count; /**< \brief Number of pages of this size */ + } * page_types; +}; + +/** \brief Structure of a topology object + * + * Applications must not modify any field except hwloc_obj.userdata. + */ +struct hwloc_obj { + /* physical information */ + hwloc_obj_type_t type; /**< \brief Type of object */ + unsigned os_index; /**< \brief OS-provided physical index number */ + char *name; /**< \brief Object description if any */ + + struct hwloc_obj_memory_s memory; /**< \brief Memory attributes */ + + union hwloc_obj_attr_u *attr; /**< \brief Object type-specific Attributes, + * may be \c NULL if no attribute value was found */ + + /* global position */ + unsigned depth; /**< \brief Vertical index in the hierarchy. + * If the topology is symmetric, this is equal to the + * parent depth plus one, and also equal to the number + * of parent/child links from the root object to here. + */ + unsigned logical_index; /**< \brief Horizontal index in the whole list of similar objects, + * could be a "cousin_rank" since it's the rank within the "cousin" list below */ + signed os_level; /**< \brief OS-provided physical level, -1 if unknown or meaningless */ + + /* cousins are all objects of the same type (and depth) across the entire topology */ + struct hwloc_obj *next_cousin; /**< \brief Next object of same type and depth */ + struct hwloc_obj *prev_cousin; /**< \brief Previous object of same type and depth */ + + /* children of the same parent are siblings, even if they may have different type and depth */ + struct hwloc_obj *parent; /**< \brief Parent, \c NULL if root (system object) */ + unsigned sibling_rank; /**< \brief Index in parent's \c children[] array */ + struct hwloc_obj *next_sibling; /**< \brief Next object below the same parent */ + struct hwloc_obj *prev_sibling; /**< \brief Previous object below the same parent */ + + /* children array below this object */ + unsigned arity; /**< \brief Number of children */ + struct hwloc_obj **children; /**< \brief Children, \c children[0 .. arity -1] */ + struct hwloc_obj *first_child; /**< \brief First child */ + struct hwloc_obj *last_child; /**< \brief Last child */ + + /* misc */ + void *userdata; /**< \brief Application-given private data pointer, + * initialized to \c NULL, use it as you wish. + * See hwloc_topology_set_userdata_export_callback() + * if you wish to export this field to XML. */ + + /* cpusets and nodesets */ + hwloc_cpuset_t cpuset; /**< \brief CPUs covered by this object + * + * This is the set of CPUs for which there are PU objects in the topology + * under this object, i.e. which are known to be physically contained in this + * object and known how (the children path between this object and the PU + * objects). + * + * If the HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM configuration flag is set, some of + * these CPUs may be offline, or not allowed for binding, see online_cpuset + * and allowed_cpuset. + * + * \note Its value must not be changed, hwloc_bitmap_dup must be used instead. + */ + hwloc_cpuset_t complete_cpuset; /**< \brief The complete CPU set of logical processors of this object, + * + * This includes not only the same as the cpuset field, but also the CPUs for + * which topology information is unknown or incomplete, and the CPUs that are + * ignored when the HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM flag is not set. + * Thus no corresponding PU object may be found in the topology, because the + * precise position is undefined. It is however known that it would be somewhere + * under this object. + * + * \note Its value must not be changed, hwloc_bitmap_dup must be used instead. + */ + hwloc_cpuset_t online_cpuset; /**< \brief The CPU set of online logical processors + * + * This includes the CPUs contained in this object that are online, i.e. draw + * power and can execute threads. It may however not be allowed to bind to + * them due to administration rules, see allowed_cpuset. + * + * \note Its value must not be changed, hwloc_bitmap_dup must be used instead. + */ + hwloc_cpuset_t allowed_cpuset; /**< \brief The CPU set of allowed logical processors + * + * This includes the CPUs contained in this object which are allowed for + * binding, i.e. passing them to the hwloc binding functions should not return + * permission errors. This is usually restricted by administration rules. + * Some of them may however be offline so binding to them may still not be + * possible, see online_cpuset. + * + * \note Its value must not be changed, hwloc_bitmap_dup must be used instead. + */ + + hwloc_nodeset_t nodeset; /**< \brief NUMA nodes covered by this object or containing this object + * + * This is the set of NUMA nodes for which there are NODE objects in the + * topology under or above this object, i.e. which are known to be physically + * contained in this object or containing it and known how (the children path + * between this object and the NODE objects). + * + * In the end, these nodes are those that are close to the current object. + * + * If the HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM configuration flag is set, some of + * these nodes may not be allowed for allocation, see allowed_nodeset. + * + * If there are no NUMA nodes in the machine, all the memory is close to this + * object, so \p nodeset is full. + * + * \note Its value must not be changed, hwloc_bitmap_dup must be used instead. + */ + hwloc_nodeset_t complete_nodeset; /**< \brief The complete NUMA node set of this object, + * + * This includes not only the same as the nodeset field, but also the NUMA + * nodes for which topology information is unknown or incomplete, and the nodes + * that are ignored when the HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM flag is not set. + * Thus no corresponding NODE object may be found in the topology, because the + * precise position is undefined. It is however known that it would be + * somewhere under this object. + * + * If there are no NUMA nodes in the machine, all the memory is close to this + * object, so \p complete_nodeset is full. + * + * \note Its value must not be changed, hwloc_bitmap_dup must be used instead. + */ + hwloc_nodeset_t allowed_nodeset; /**< \brief The set of allowed NUMA memory nodes + * + * This includes the NUMA memory nodes contained in this object which are + * allowed for memory allocation, i.e. passing them to NUMA node-directed + * memory allocation should not return permission errors. This is usually + * restricted by administration rules. + * + * If there are no NUMA nodes in the machine, all the memory is close to this + * object, so \p allowed_nodeset is full. + * + * \note Its value must not be changed, hwloc_bitmap_dup must be used instead. + */ + + struct hwloc_distances_s **distances; /**< \brief Distances between all objects at same depth below this object */ + unsigned distances_count; + + struct hwloc_obj_info_s *infos; /**< \brief Array of stringified info type=name. */ + unsigned infos_count; /**< \brief Size of infos array. */ + + int symmetric_subtree; /**< \brief Set if the subtree of objects below this object is symmetric, + * which means all children and their children have identical subtrees. + * If set in the topology root object, lstopo may export the topology + * as a synthetic string. + */ +}; +/** + * \brief Convenience typedef; a pointer to a struct hwloc_obj. + */ +typedef struct hwloc_obj * hwloc_obj_t; + +/** \brief Object type-specific Attributes */ +union hwloc_obj_attr_u { + /** \brief Cache-specific Object Attributes */ + struct hwloc_cache_attr_s { + hwloc_uint64_t size; /**< \brief Size of cache in bytes */ + unsigned depth; /**< \brief Depth of cache (e.g., L1, L2, ...etc.) */ + unsigned linesize; /**< \brief Cache-line size in bytes. 0 if unknown */ + int associativity; /**< \brief Ways of associativity, + * -1 if fully associative, 0 if unknown */ + hwloc_obj_cache_type_t type; /**< \brief Cache type */ + } cache; + /** \brief Group-specific Object Attributes */ + struct hwloc_group_attr_s { + unsigned depth; /**< \brief Depth of group object */ + } group; + /** \brief PCI Device specific Object Attributes */ + struct hwloc_pcidev_attr_s { + unsigned short domain; + unsigned char bus, dev, func; + unsigned short class_id; + unsigned short vendor_id, device_id, subvendor_id, subdevice_id; + unsigned char revision; + float linkspeed; /* in GB/s */ + } pcidev; + /** \brief Bridge specific Object Attribues */ + struct hwloc_bridge_attr_s { + union { + struct hwloc_pcidev_attr_s pci; + } upstream; + hwloc_obj_bridge_type_t upstream_type; + union { + struct { + unsigned short domain; + unsigned char secondary_bus, subordinate_bus; + } pci; + } downstream; + hwloc_obj_bridge_type_t downstream_type; + unsigned depth; + } bridge; + /** \brief OS Device specific Object Attributes */ + struct hwloc_osdev_attr_s { + hwloc_obj_osdev_type_t type; + } osdev; +}; + +/** \brief Distances between objects + * + * One object may contain a distance structure describing distances + * between all its descendants at a given relative depth. If the + * containing object is the root object of the topology, then the + * distances are available for all objects in the machine. + * + * If the \p latency pointer is not \c NULL, the pointed array contains + * memory latencies (non-zero values), as defined by the ACPI SLIT + * specification. + * + * In the future, some other types of distances may be considered. + * In these cases, \p latency may be \c NULL. + */ +struct hwloc_distances_s { + unsigned relative_depth; /**< \brief Relative depth of the considered objects + * below the object containing this distance information. */ + unsigned nbobjs; /**< \brief Number of objects considered in the matrix. + * It is the number of descendant objects at \p relative_depth + * below the containing object. + * It corresponds to the result of hwloc_get_nbobjs_inside_cpuset_by_depth. */ + + float *latency; /**< \brief Matrix of latencies between objects, stored as a one-dimension array. + * May be \c NULL if the distances considered here are not latencies. + * Values are normalized to get 1.0 as the minimal value in the matrix. + * Latency from i-th to j-th object is stored in slot i*nbobjs+j. + */ + float latency_max; /**< \brief The maximal value in the latency matrix. */ + float latency_base; /**< \brief The multiplier that should be applied to latency matrix + * to retrieve the original OS-provided latencies. + * Usually 10 on Linux since ACPI SLIT uses 10 for local latency. + */ +}; + +/** \brief Object info */ +struct hwloc_obj_info_s { + char *name; /**< \brief Info name */ + char *value; /**< \brief Info value */ +}; + +/** @} */ + + + +/** \defgroup hwlocality_creation Topology Creation and Destruction + * @{ + */ + +struct hwloc_topology; +/** \brief Topology context + * + * To be initialized with hwloc_topology_init() and built with hwloc_topology_load(). + */ +typedef struct hwloc_topology * hwloc_topology_t; + +/** \brief Allocate a topology context. + * + * \param[out] topologyp is assigned a pointer to the new allocated context. + * + * \return 0 on success, -1 on error. + */ +HWLOC_DECLSPEC int hwloc_topology_init (hwloc_topology_t *topologyp); + +/** \brief Build the actual topology + * + * Build the actual topology once initialized with hwloc_topology_init() and + * tuned with \ref hwlocality_configuration routines. + * No other routine may be called earlier using this topology context. + * + * \param topology is the topology to be loaded with objects. + * + * \return 0 on success, -1 on error. + * + * \note On failure, the topology is reinitialized. It should be either + * destroyed with hwloc_topology_destroy() or configured and loaded again. + * + * \note This function may be called only once per topology. + * + * \sa hwlocality_configuration + */ +HWLOC_DECLSPEC int hwloc_topology_load(hwloc_topology_t topology); + +/** \brief Terminate and free a topology context + * + * \param topology is the topology to be freed + */ +HWLOC_DECLSPEC void hwloc_topology_destroy (hwloc_topology_t topology); + +/** \brief Run internal checks on a topology structure + * + * The program aborts if an inconsistency is detected in the given topology. + * + * \param topology is the topology to be checked + * + * \note This routine is only useful to developers. + * + * \note The input topology should have been previously loaded with + * hwloc_topology_load(). + */ +HWLOC_DECLSPEC void hwloc_topology_check(hwloc_topology_t topology); + +/** @} */ + + + +/** \defgroup hwlocality_configuration Topology Detection Configuration and Query + * + * Several functions can optionally be called between hwloc_topology_init() and + * hwloc_topology_load() to configure how the detection should be performed, + * e.g. to ignore some objects types, define a synthetic topology, etc. + * + * If none of them is called, the default is to detect all the objects of the + * machine that the caller is allowed to access. + * + * This default behavior may also be modified through environment variables + * if the application did not modify it already. + * Setting HWLOC_XMLFILE in the environment enforces the discovery from a XML + * file as if hwloc_topology_set_xml() had been called. + * HWLOC_FSROOT switches to reading the topology from the specified Linux + * filesystem root as if hwloc_topology_set_fsroot() had been called. + * Finally, HWLOC_THISSYSTEM enforces the return value of + * hwloc_topology_is_thissystem(). + * + * @{ + */ + +/** \brief Ignore an object type. + * + * Ignore all objects from the given type. + * The bottom-level type HWLOC_OBJ_PU may not be ignored. + * The top-level object of the hierarchy will never be ignored, even if this function + * succeeds. + * I/O objects may not be ignored, topology flags should be used to configure + * their discovery instead. + */ +HWLOC_DECLSPEC int hwloc_topology_ignore_type(hwloc_topology_t topology, hwloc_obj_type_t type); + +/** \brief Ignore an object type if it does not bring any structure. + * + * Ignore all objects from the given type as long as they do not bring any structure: + * Each ignored object should have a single children or be the only child of its parent. + * The bottom-level type HWLOC_OBJ_PU may not be ignored. + * I/O objects may not be ignored, topology flags should be used to configure + * their discovery instead. + */ +HWLOC_DECLSPEC int hwloc_topology_ignore_type_keep_structure(hwloc_topology_t topology, hwloc_obj_type_t type); + +/** \brief Ignore all objects that do not bring any structure. + * + * Ignore all objects that do not bring any structure: + * Each ignored object should have a single children or be the only child of its parent. + * I/O objects may not be ignored, topology flags should be used to configure + * their discovery instead. + */ +HWLOC_DECLSPEC int hwloc_topology_ignore_all_keep_structure(hwloc_topology_t topology); + +/** \brief Flags to be set onto a topology context before load. + * + * Flags should be given to hwloc_topology_set_flags(). + * They may also be returned by hwloc_topology_get_flags(). + */ +enum hwloc_topology_flags_e { + /** \brief Detect the whole system, ignore reservations and offline settings. + * + * Gather all resources, even if some were disabled by the administrator. + * For instance, ignore Linux Cpusets and gather all processors and memory nodes, + * and ignore the fact that some resources may be offline. + * \hideinitializer + */ + HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM = (1UL<<0), + + /** \brief Assume that the selected backend provides the topology for the + * system on which we are running. + * + * This forces hwloc_topology_is_thissystem to return 1, i.e. makes hwloc assume that + * the selected backend provides the topology for the system on which we are running, + * even if it is not the OS-specific backend but the XML backend for instance. + * This means making the binding functions actually call the OS-specific + * system calls and really do binding, while the XML backend would otherwise + * provide empty hooks just returning success. + * + * Setting the environment variable HWLOC_THISSYSTEM may also result in the + * same behavior. + * + * This can be used for efficiency reasons to first detect the topology once, + * save it to an XML file, and quickly reload it later through the XML + * backend, but still having binding functions actually do bind. + * \hideinitializer + */ + HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM = (1UL<<1), + + /** \brief Detect PCI devices. + * + * By default, I/O devices are ignored. This flag enables I/O device + * detection using the pci backend. Only the common PCI devices (GPUs, + * NICs, block devices, ...) and host bridges (objects that connect the host + * objects to an I/O subsystem) will be added to the topology. + * Uncommon devices and other bridges (such as PCI-to-PCI bridges) will be + * ignored. + * \hideinitializer + */ + HWLOC_TOPOLOGY_FLAG_IO_DEVICES = (1UL<<2), + + /** \brief Detect PCI bridges. + * + * This flag should be combined with HWLOC_TOPOLOGY_FLAG_IO_DEVICES to enable + * the detection of both common devices and of all useful bridges (bridges that + * have at least one device behind them). + * \hideinitializer + */ + HWLOC_TOPOLOGY_FLAG_IO_BRIDGES = (1UL<<3), + + /** \brief Detect the whole PCI hierarchy. + * + * This flag enables detection of all I/O devices (even the uncommon ones) + * and bridges (even those that have no device behind them) using the pci + * backend. + * \hideinitializer + */ + HWLOC_TOPOLOGY_FLAG_WHOLE_IO = (1UL<<4), + + /** \brief Detect instruction caches. + * + * This flag enables detection of Instruction caches, + * instead of only Data and Unified caches. + * \hideinitializer + */ + HWLOC_TOPOLOGY_FLAG_ICACHES = (1UL<<5) +}; + +/** \brief Set OR'ed flags to non-yet-loaded topology. + * + * Set a OR'ed set of ::hwloc_topology_flags_e onto a topology that was not yet loaded. + * + * If this function is called multiple times, the last invokation will erase + * and replace the set of flags that was previously set. + * + * The flags set in a topology may be retrieved with hwloc_topology_get_flags() + */ +HWLOC_DECLSPEC int hwloc_topology_set_flags (hwloc_topology_t topology, unsigned long flags); + +/** \brief Get OR'ed flags of a topology. + * + * Get the OR'ed set of ::hwloc_topology_flags_e of a topology. + * + * \return the flags previously set with hwloc_topology_set_flags(). + */ +HWLOC_DECLSPEC unsigned long hwloc_topology_get_flags (hwloc_topology_t topology); + +/** \brief Change which pid the topology is viewed from + * + * On some systems, processes may have different views of the machine, for + * instance the set of allowed CPUs. By default, hwloc exposes the view from + * the current process. Calling hwloc_topology_set_pid() permits to make it + * expose the topology of the machine from the point of view of another + * process. + * + * \note \p hwloc_pid_t is \p pid_t on Unix platforms, + * and \p HANDLE on native Windows platforms. + * + * \note -1 is returned and errno is set to ENOSYS on platforms that do not + * support this feature. + */ +HWLOC_DECLSPEC int hwloc_topology_set_pid(hwloc_topology_t __hwloc_restrict topology, hwloc_pid_t pid); + +/** \brief Change the file-system root path when building the topology from sysfs/procfs. + * + * On Linux system, use sysfs and procfs files as if they were mounted on the given + * \p fsroot_path instead of the main file-system root. Setting the environment + * variable HWLOC_FSROOT may also result in this behavior. + * Not using the main file-system root causes hwloc_topology_is_thissystem() + * to return 0. + * + * Note that this function does not actually load topology + * information; it just tells hwloc where to load it from. You'll + * still need to invoke hwloc_topology_load() to actually load the + * topology information. + * + * \return -1 with errno set to ENOSYS on non-Linux and on Linux systems that + * do not support it. + * \return -1 with the appropriate errno if \p fsroot_path cannot be used. + * + * \note For convenience, this backend provides empty binding hooks which just + * return success. To have hwloc still actually call OS-specific hooks, the + * HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM has to be set to assert that the loaded + * file is really the underlying system. + * + * \note On success, the Linux component replaces the previously enabled + * component (if any), but the topology is not actually modified until + * hwloc_topology_load(). + */ +HWLOC_DECLSPEC int hwloc_topology_set_fsroot(hwloc_topology_t __hwloc_restrict topology, const char * __hwloc_restrict fsroot_path); + +/** \brief Enable synthetic topology. + * + * Gather topology information from the given \p description, + * a space-separated string of numbers describing + * the arity of each level. + * Each number may be prefixed with a type and a colon to enforce the type + * of a level. If only some level types are enforced, hwloc will try to + * choose the other types according to usual topologies, but it may fail + * and you may have to specify more level types manually. + * See also the \ref synthetic. + * + * If \p description was properly parsed and describes a valid topology + * configuration, this function returns 0. + * Otherwise -1 is returned and errno is set to EINVAL. + * + * Note that this function does not actually load topology + * information; it just tells hwloc where to load it from. You'll + * still need to invoke hwloc_topology_load() to actually load the + * topology information. + * + * \note For convenience, this backend provides empty binding hooks which just + * return success. + * + * \note On success, the synthetic component replaces the previously enabled + * component (if any), but the topology is not actually modified until + * hwloc_topology_load(). + */ +HWLOC_DECLSPEC int hwloc_topology_set_synthetic(hwloc_topology_t __hwloc_restrict topology, const char * __hwloc_restrict description); + +/** \brief Enable XML-file based topology. + * + * Gather topology information from the XML file given at \p xmlpath. + * Setting the environment variable HWLOC_XMLFILE may also result in this behavior. + * This file may have been generated earlier with hwloc_topology_export_xml() + * or lstopo file.xml. + * + * Note that this function does not actually load topology + * information; it just tells hwloc where to load it from. You'll + * still need to invoke hwloc_topology_load() to actually load the + * topology information. + * + * \return -1 with errno set to EINVAL on failure to read the XML file. + * + * \note See also hwloc_topology_set_userdata_import_callback() + * for importing application-specific userdata. + * + * \note For convenience, this backend provides empty binding hooks which just + * return success. To have hwloc still actually call OS-specific hooks, the + * HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM has to be set to assert that the loaded + * file is really the underlying system. + * + * \note On success, the XML component replaces the previously enabled + * component (if any), but the topology is not actually modified until + * hwloc_topology_load(). + */ +HWLOC_DECLSPEC int hwloc_topology_set_xml(hwloc_topology_t __hwloc_restrict topology, const char * __hwloc_restrict xmlpath); + +/** \brief Enable XML based topology using a memory buffer (instead of + * a file, as with hwloc_topology_set_xml()). + * + * Gather topology information from the XML memory buffer given at \p + * buffer and of length \p size. This buffer may have been filled + * earlier with hwloc_topology_export_xmlbuffer(). + * + * Note that this function does not actually load topology + * information; it just tells hwloc where to load it from. You'll + * still need to invoke hwloc_topology_load() to actually load the + * topology information. + * + * \return -1 with errno set to EINVAL on failure to read the XML buffer. + * + * \note See also hwloc_topology_set_userdata_import_callback() + * for importing application-specific userdata. + * + * \note For convenience, this backend provides empty binding hooks which just + * return success. To have hwloc still actually call OS-specific hooks, the + * HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM has to be set to assert that the loaded + * file is really the underlying system. + * + * \note On success, the XML component replaces the previously enabled + * component (if any), but the topology is not actually modified until + * hwloc_topology_load(). + */ +HWLOC_DECLSPEC int hwloc_topology_set_xmlbuffer(hwloc_topology_t __hwloc_restrict topology, const char * __hwloc_restrict buffer, int size); + +/** \brief Prepare the topology for custom assembly. + * + * The topology then contains a single root object. + * It must then be built by inserting other topologies with + * hwloc_custom_insert_topology() or single objects with + * hwloc_custom_insert_group_object_by_parent(). + * hwloc_topology_load() must be called to finalize the new + * topology as usual. + * + * \note If nothing is inserted in the topology, + * hwloc_topology_load() will fail with errno set to EINVAL. + * + * \note The cpuset and nodeset of the root object are NULL because + * these sets are meaningless when assembling multiple topologies. + * + * \note On success, the custom component replaces the previously enabled + * component (if any), but the topology is not actually modified until + * hwloc_topology_load(). + */ +HWLOC_DECLSPEC int hwloc_topology_set_custom(hwloc_topology_t topology); + +/** \brief Provide a distance matrix. + * + * Provide the matrix of distances between a set of objects of the given type. + * The set may or may not contain all the existing objects of this type. + * The objects are specified by their OS/physical index in the \p os_index + * array. The \p distances matrix follows the same order. + * The distance from object i to object j in the i*nbobjs+j. + * + * A single latency matrix may be defined for each type. + * If another distance matrix already exists for the given type, + * either because the user specified it or because the OS offers it, + * it will be replaced by the given one. + * If \p nbobjs is \c 0, \p os_index is \c NULL and \p distances is \c NULL, + * the existing distance matrix for the given type is removed. + * + * \note Distance matrices are ignored in multi-node topologies. + */ +HWLOC_DECLSPEC int hwloc_topology_set_distance_matrix(hwloc_topology_t __hwloc_restrict topology, + hwloc_obj_type_t type, unsigned nbobjs, + unsigned *os_index, float *distances); + +/** \brief Does the topology context come from this system? + * + * \return 1 if this topology context was built using the system + * running this program. + * \return 0 instead (for instance if using another file-system root, + * a XML topology file, or a synthetic topology). + */ +HWLOC_DECLSPEC int hwloc_topology_is_thissystem(hwloc_topology_t __hwloc_restrict topology) __hwloc_attribute_pure; + +/** \brief Flags describing actual discovery support for this topology. */ +struct hwloc_topology_discovery_support { + /** \brief Detecting the number of PU objects is supported. */ + unsigned char pu; +}; + +/** \brief Flags describing actual PU binding support for this topology. */ +struct hwloc_topology_cpubind_support { + /** Binding the whole current process is supported. */ + unsigned char set_thisproc_cpubind; + /** Getting the binding of the whole current process is supported. */ + unsigned char get_thisproc_cpubind; + /** Binding a whole given process is supported. */ + unsigned char set_proc_cpubind; + /** Getting the binding of a whole given process is supported. */ + unsigned char get_proc_cpubind; + /** Binding the current thread only is supported. */ + unsigned char set_thisthread_cpubind; + /** Getting the binding of the current thread only is supported. */ + unsigned char get_thisthread_cpubind; + /** Binding a given thread only is supported. */ + unsigned char set_thread_cpubind; + /** Getting the binding of a given thread only is supported. */ + unsigned char get_thread_cpubind; + /** Getting the last processors where the whole current process ran is supported */ + unsigned char get_thisproc_last_cpu_location; + /** Getting the last processors where a whole process ran is supported */ + unsigned char get_proc_last_cpu_location; + /** Getting the last processors where the current thread ran is supported */ + unsigned char get_thisthread_last_cpu_location; +}; + +/** \brief Flags describing actual memory binding support for this topology. */ +struct hwloc_topology_membind_support { + /** Binding the whole current process is supported. */ + unsigned char set_thisproc_membind; + /** Getting the binding of the whole current process is supported. */ + unsigned char get_thisproc_membind; + /** Binding a whole given process is supported. */ + unsigned char set_proc_membind; + /** Getting the binding of a whole given process is supported. */ + unsigned char get_proc_membind; + /** Binding the current thread only is supported. */ + unsigned char set_thisthread_membind; + /** Getting the binding of the current thread only is supported. */ + unsigned char get_thisthread_membind; + /** Binding a given memory area is supported. */ + unsigned char set_area_membind; + /** Getting the binding of a given memory area is supported. */ + unsigned char get_area_membind; + /** Allocating a bound memory area is supported. */ + unsigned char alloc_membind; + /** First-touch policy is supported. */ + unsigned char firsttouch_membind; + /** Bind policy is supported. */ + unsigned char bind_membind; + /** Interleave policy is supported. */ + unsigned char interleave_membind; + /** Replication policy is supported. */ + unsigned char replicate_membind; + /** Next-touch migration policy is supported. */ + unsigned char nexttouch_membind; + + /** Migration flags is supported. */ + unsigned char migrate_membind; +}; + +/** \brief Set of flags describing actual support for this topology. + * + * This is retrieved with hwloc_topology_get_support() and will be valid until + * the topology object is destroyed. Note: the values are correct only after + * discovery. + */ +struct hwloc_topology_support { + struct hwloc_topology_discovery_support *discovery; + struct hwloc_topology_cpubind_support *cpubind; + struct hwloc_topology_membind_support *membind; +}; + +/** \brief Retrieve the topology support. */ +HWLOC_DECLSPEC const struct hwloc_topology_support *hwloc_topology_get_support(hwloc_topology_t __hwloc_restrict topology); + +/** @} */ + + + +/** \defgroup hwlocality_levels Object levels, depths and types + * @{ + * + * Be sure to see the figure in \ref termsanddefs that shows a + * complete topology tree, including depths, child/sibling/cousin + * relationships, and an example of an asymmetric topology where one + * socket has fewer caches than its peers. + */ + +/** \brief Get the depth of the hierarchical tree of objects. + * + * This is the depth of HWLOC_OBJ_PU objects plus one. + */ +HWLOC_DECLSPEC unsigned hwloc_topology_get_depth(hwloc_topology_t __hwloc_restrict topology) __hwloc_attribute_pure; + +/** \brief Returns the depth of objects of type \p type. + * + * If no object of this type is present on the underlying architecture, or if + * the OS doesn't provide this kind of information, the function returns + * HWLOC_TYPE_DEPTH_UNKNOWN. + * + * If type is absent but a similar type is acceptable, see also + * hwloc_get_type_or_below_depth() and hwloc_get_type_or_above_depth(). + * + * If some objects of the given type exist in different levels, + * for instance L1 and L2 caches, or L1i and L1d caches, + * the function returns HWLOC_TYPE_DEPTH_MULTIPLE. + * See hwloc_get_cache_type_depth() in hwloc/helper.h to better handle this + * case. + * + * If an I/O object type is given, the function returns a virtual value + * because I/O objects are stored in special levels that are not CPU-related. + * This virtual depth may be passed to other hwloc functions such as + * hwloc_get_obj_by_depth() but it should not be considered as an actual + * depth by the application. In particular, it should not be compared with + * any other object depth or with the entire topology depth. + */ +HWLOC_DECLSPEC int hwloc_get_type_depth (hwloc_topology_t topology, hwloc_obj_type_t type); + +enum hwloc_get_type_depth_e { + HWLOC_TYPE_DEPTH_UNKNOWN = -1, /**< \brief No object of given type exists in the topology. \hideinitializer */ + HWLOC_TYPE_DEPTH_MULTIPLE = -2, /**< \brief Objects of given type exist at different depth in the topology. \hideinitializer */ + HWLOC_TYPE_DEPTH_BRIDGE = -3, /**< \brief Virtual depth for bridge object level. \hideinitializer */ + HWLOC_TYPE_DEPTH_PCI_DEVICE = -4, /**< \brief Virtual depth for PCI device object level. \hideinitializer */ + HWLOC_TYPE_DEPTH_OS_DEVICE = -5 /**< \brief Virtual depth for software device object level. \hideinitializer */ +}; + +/** \brief Returns the depth of objects of type \p type or below + * + * If no object of this type is present on the underlying architecture, the + * function returns the depth of the first "present" object typically found + * inside \p type. + * + * If some objects of the given type exist in different levels, for instance + * L1 and L2 caches, the function returns HWLOC_TYPE_DEPTH_MULTIPLE. + */ +static __hwloc_inline int +hwloc_get_type_or_below_depth (hwloc_topology_t topology, hwloc_obj_type_t type) __hwloc_attribute_pure; + +/** \brief Returns the depth of objects of type \p type or above + * + * If no object of this type is present on the underlying architecture, the + * function returns the depth of the first "present" object typically + * containing \p type. + * + * If some objects of the given type exist in different levels, for instance + * L1 and L2 caches, the function returns HWLOC_TYPE_DEPTH_MULTIPLE. + */ +static __hwloc_inline int +hwloc_get_type_or_above_depth (hwloc_topology_t topology, hwloc_obj_type_t type) __hwloc_attribute_pure; + +/** \brief Returns the type of objects at depth \p depth. + * + * \return -1 if depth \p depth does not exist. + */ +HWLOC_DECLSPEC hwloc_obj_type_t hwloc_get_depth_type (hwloc_topology_t topology, unsigned depth) __hwloc_attribute_pure; + +/** \brief Returns the width of level at depth \p depth. + */ +HWLOC_DECLSPEC unsigned hwloc_get_nbobjs_by_depth (hwloc_topology_t topology, unsigned depth) __hwloc_attribute_pure; + +/** \brief Returns the width of level type \p type + * + * If no object for that type exists, 0 is returned. + * If there are several levels with objects of that type, -1 is returned. + */ +static __hwloc_inline int +hwloc_get_nbobjs_by_type (hwloc_topology_t topology, hwloc_obj_type_t type) __hwloc_attribute_pure; + +/** \brief Returns the top-object of the topology-tree. + * + * Its type is typically ::HWLOC_OBJ_MACHINE but it could be different + * for complex topologies. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_root_obj (hwloc_topology_t topology) __hwloc_attribute_pure; + +/** \brief Returns the topology object at logical index \p idx from depth \p depth */ +HWLOC_DECLSPEC hwloc_obj_t hwloc_get_obj_by_depth (hwloc_topology_t topology, unsigned depth, unsigned idx) __hwloc_attribute_pure; + +/** \brief Returns the topology object at logical index \p idx with type \p type + * + * If no object for that type exists, \c NULL is returned. + * If there are several levels with objects of that type, \c NULL is returned + * and ther caller may fallback to hwloc_get_obj_by_depth(). + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_obj_by_type (hwloc_topology_t topology, hwloc_obj_type_t type, unsigned idx) __hwloc_attribute_pure; + +/** \brief Returns the next object at depth \p depth. + * + * If \p prev is \c NULL, return the first object at depth \p depth. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_next_obj_by_depth (hwloc_topology_t topology, unsigned depth, hwloc_obj_t prev); + +/** \brief Returns the next object of type \p type. + * + * If \p prev is \c NULL, return the first object at type \p type. If + * there are multiple or no depth for given type, return \c NULL and + * let the caller fallback to hwloc_get_next_obj_by_depth(). + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_next_obj_by_type (hwloc_topology_t topology, hwloc_obj_type_t type, + hwloc_obj_t prev); + +/** @} */ + + + +/** \defgroup hwlocality_object_strings Manipulating Object Type, Sets and Attributes as Strings + * @{ + */ + +/** \brief Return a stringified topology object type */ +HWLOC_DECLSPEC const char * hwloc_obj_type_string (hwloc_obj_type_t type) __hwloc_attribute_const; + +/** \brief Return an object type from the string + * + * \return -1 if unrecognized. + */ +HWLOC_DECLSPEC hwloc_obj_type_t hwloc_obj_type_of_string (const char * string) __hwloc_attribute_pure; + +/** \brief Stringify the type of a given topology object into a human-readable form. + * + * It differs from hwloc_obj_type_string() because it prints type attributes such + * as cache depth and type. + * + * If \p size is 0, \p string may safely be \c NULL. + * + * \return the number of character that were actually written if not truncating, + * or that would have been written (not including the ending \\0). + */ +HWLOC_DECLSPEC int hwloc_obj_type_snprintf(char * __hwloc_restrict string, size_t size, hwloc_obj_t obj, + int verbose); + +/** \brief Stringify the attributes of a given topology object into a human-readable form. + * + * Attribute values are separated by \p separator. + * + * Only the major attributes are printed in non-verbose mode. + * + * If \p size is 0, \p string may safely be \c NULL. + * + * \return the number of character that were actually written if not truncating, + * or that would have been written (not including the ending \\0). + */ +HWLOC_DECLSPEC int hwloc_obj_attr_snprintf(char * __hwloc_restrict string, size_t size, hwloc_obj_t obj, const char * __hwloc_restrict separator, + int verbose); + +/** \brief Stringify the cpuset containing a set of objects. + * + * If \p size is 0, \p string may safely be \c NULL. + * + * \return the number of character that were actually written if not truncating, + * or that would have been written (not including the ending \\0). + */ +HWLOC_DECLSPEC int hwloc_obj_cpuset_snprintf(char * __hwloc_restrict str, size_t size, size_t nobj, const hwloc_obj_t * __hwloc_restrict objs); + +/** \brief Search the given key name in object infos and return the corresponding value. + * + * If multiple keys match the given name, only the first one is returned. + * + * \return \c NULL if no such key exists. + */ +static __hwloc_inline const char * +hwloc_obj_get_info_by_name(hwloc_obj_t obj, const char *name) __hwloc_attribute_pure; + +/** \brief Add the given info name and value pair to the given object. + * + * The info is appended to the existing info array even if another key + * with the same name already exists. + * + * The input strings are copied before being added in the object infos. + * + * \note This function may be used to enforce object colors in the lstopo + * graphical output by using "lstopoStyle" as a name and "Background=#rrggbb" + * as a value. See CUSTOM COLORS in the lstopo(1) manpage for details. + * + * \note If \p value contains some non-printable characters, they will + * be dropped when exporting to XML, see hwloc_topology_export_xml(). + */ +HWLOC_DECLSPEC void hwloc_obj_add_info(hwloc_obj_t obj, const char *name, const char *value); + +/** @} */ + + + +/** \defgroup hwlocality_cpubinding CPU binding + * + * It is often useful to call hwloc_bitmap_singlify() first so that a single CPU + * remains in the set. This way, the process will not even migrate between + * different CPUs. Some operating systems also only support that kind of binding. + * + * \note Some operating systems do not provide all hwloc-supported + * mechanisms to bind processes, threads, etc. and the corresponding + * binding functions may fail. -1 is returned and errno is set to + * ENOSYS when it is not possible to bind the requested kind of object + * processes/threads. errno is set to EXDEV when the requested cpuset + * can not be enforced (e.g. some systems only allow one CPU, and some + * other systems only allow one NUMA node). + * + * The most portable version that should be preferred over the others, whenever + * possible, is + * + * \code + * hwloc_set_cpubind(topology, set, 0), + * \endcode + * + * as it just binds the current program, assuming it is single-threaded, or + * + * \code + * hwloc_set_cpubind(topology, set, HWLOC_CPUBIND_THREAD), + * \endcode + * + * which binds the current thread of the current program (which may be + * multithreaded). + * + * \note To unbind, just call the binding function with either a full cpuset or + * a cpuset equal to the system cpuset. + * + * \note On some operating systems, CPU binding may have effects on memory binding, see + * ::HWLOC_CPUBIND_NOMEMBIND + * + * Running lstopo --top can be a very convenient tool to check how binding + * actually happened. + * @{ + */ + +/** \brief Process/Thread binding flags. + * + * These bit flags can be used to refine the binding policy. + * + * The default (0) is to bind the current process, assumed to be + * single-threaded, in a non-strict way. This is the most portable + * way to bind as all operating systems usually provide it. + * + * \note Not all systems support all kinds of binding. See the + * "Detailed Description" section of \ref hwlocality_cpubinding for a + * description of errors that can occur. + */ +typedef enum { + /** \brief Bind all threads of the current (possibly) multithreaded process. + * \hideinitializer */ + HWLOC_CPUBIND_PROCESS = (1<<0), + + /** \brief Bind current thread of current process. + * \hideinitializer */ + HWLOC_CPUBIND_THREAD = (1<<1), + + /** \brief Request for strict binding from the OS. + * + * By default, when the designated CPUs are all busy while other + * CPUs are idle, operating systems may execute the thread/process + * on those other CPUs instead of the designated CPUs, to let them + * progress anyway. Strict binding means that the thread/process + * will _never_ execute on other cpus than the designated CPUs, even + * when those are busy with other tasks and other CPUs are idle. + * + * \note Depending on the operating system, strict binding may not + * be possible (e.g., the OS does not implement it) or not allowed + * (e.g., for an administrative reasons), and the function will fail + * in that case. + * + * When retrieving the binding of a process, this flag checks + * whether all its threads actually have the same binding. If the + * flag is not given, the binding of each thread will be + * accumulated. + * + * \note This flag is meaningless when retrieving the binding of a + * thread. + * \hideinitializer + */ + HWLOC_CPUBIND_STRICT = (1<<2), + + /** \brief Avoid any effect on memory binding + * + * On some operating systems, some CPU binding function would also + * bind the memory on the corresponding NUMA node. It is often not + * a problem for the application, but if it is, setting this flag + * will make hwloc avoid using OS functions that would also bind + * memory. This will however reduce the support of CPU bindings, + * i.e. potentially return -1 with errno set to ENOSYS in some + * cases. + * + * This flag is only meaningful when used with functions that set + * the CPU binding. It is ignored when used with functions that get + * CPU binding information. + * \hideinitializer + */ + HWLOC_CPUBIND_NOMEMBIND = (1<<3) +} hwloc_cpubind_flags_t; + +/** \brief Bind current process or thread on cpus given in physical bitmap \p set. + * + * \return -1 with errno set to ENOSYS if the action is not supported + * \return -1 with errno set to EXDEV if the binding cannot be enforced + */ +HWLOC_DECLSPEC int hwloc_set_cpubind(hwloc_topology_t topology, hwloc_const_cpuset_t set, int flags); + +/** \brief Get current process or thread binding. + * + * Writes into \p set the physical cpuset which the process or thread (according to \e + * flags) was last bound to. + */ +HWLOC_DECLSPEC int hwloc_get_cpubind(hwloc_topology_t topology, hwloc_cpuset_t set, int flags); + +/** \brief Bind a process \p pid on cpus given in physical bitmap \p set. + * + * \note \p hwloc_pid_t is \p pid_t on Unix platforms, + * and \p HANDLE on native Windows platforms. + * + * \note As a special case on Linux, if a tid (thread ID) is supplied + * instead of a pid (process ID) and HWLOC_CPUBIND_THREAD is passed in flags, + * the binding is applied to that specific thread. + * + * \note On non-Linux systems, HWLOC_CPUBIND_THREAD can not be used in \p flags. + */ +HWLOC_DECLSPEC int hwloc_set_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_cpuset_t set, int flags); + +/** \brief Get the current physical binding of process \p pid. + * + * \note \p hwloc_pid_t is \p pid_t on Unix platforms, + * and \p HANDLE on native Windows platforms. + * + * \note As a special case on Linux, if a tid (thread ID) is supplied + * instead of a pid (process ID) and HWLOC_CPUBIND_THREAD is passed in flags, + * the binding for that specific thread is returned. + * + * \note On non-Linux systems, HWLOC_CPUBIND_THREAD can not be used in \p flags. + */ +HWLOC_DECLSPEC int hwloc_get_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_cpuset_t set, int flags); + +#ifdef hwloc_thread_t +/** \brief Bind a thread \p thread on cpus given in physical bitmap \p set. + * + * \note \p hwloc_thread_t is \p pthread_t on Unix platforms, + * and \p HANDLE on native Windows platforms. + * + * \note HWLOC_CPUBIND_PROCESS can not be used in \p flags. + */ +HWLOC_DECLSPEC int hwloc_set_thread_cpubind(hwloc_topology_t topology, hwloc_thread_t thread, hwloc_const_cpuset_t set, int flags); +#endif + +#ifdef hwloc_thread_t +/** \brief Get the current physical binding of thread \p tid. + * + * \note \p hwloc_thread_t is \p pthread_t on Unix platforms, + * and \p HANDLE on native Windows platforms. + * + * \note HWLOC_CPUBIND_PROCESS can not be used in \p flags. + */ +HWLOC_DECLSPEC int hwloc_get_thread_cpubind(hwloc_topology_t topology, hwloc_thread_t thread, hwloc_cpuset_t set, int flags); +#endif + +/** \brief Get the last physical CPU where the current process or thread ran. + * + * The operating system may move some tasks from one processor + * to another at any time according to their binding, + * so this function may return something that is already + * outdated. + * + * \p flags can include either HWLOC_CPUBIND_PROCESS or HWLOC_CPUBIND_THREAD to + * specify whether the query should be for the whole process (union of all CPUs + * on which all threads are running), or only the current thread. If the + * process is single-threaded, flags can be set to zero to let hwloc use + * whichever method is available on the underlying OS. + */ +HWLOC_DECLSPEC int hwloc_get_last_cpu_location(hwloc_topology_t topology, hwloc_cpuset_t set, int flags); + +/** \brief Get the last physical CPU where a process ran. + * + * The operating system may move some tasks from one processor + * to another at any time according to their binding, + * so this function may return something that is already + * outdated. + * + * \note \p hwloc_pid_t is \p pid_t on Unix platforms, + * and \p HANDLE on native Windows platforms. + * + * \note As a special case on Linux, if a tid (thread ID) is supplied + * instead of a pid (process ID) and HWLOC_CPUBIND_THREAD is passed in flags, + * the last CPU location of that specific thread is returned. + * + * \note On non-Linux systems, HWLOC_CPUBIND_THREAD can not be used in \p flags. + */ +HWLOC_DECLSPEC int hwloc_get_proc_last_cpu_location(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_cpuset_t set, int flags); + +/** @} */ + + + +/** \defgroup hwlocality_membinding Memory binding + * + * Memory binding can be done three ways: + * + * - explicit memory allocation thanks to hwloc_alloc_membind and friends: the + * binding will have effect on the memory allocated by these functions. + * - implicit memory binding through binding policy: hwloc_set_membind and + * friends only define the current policy of the process, which will be + * applied to the subsequent calls to malloc() and friends. + * - migration of existing memory ranges, thanks to hwloc_set_area_membind() + * and friends, which move already-allocated data. + * + * \note Not all operating systems support all three ways Using a binding flag + * or policy that is not supported by the underlying OS will cause hwloc's + * binding functions to fail and return -1. errno will be set to + * ENOSYS when the system does support the specified action or policy + * (e.g., some systems only allow binding memory on a per-thread + * basis, whereas other systems only allow binding memory for all + * threads in a process). errno will be set to EXDEV when the + * requested cpuset can not be enforced (e.g., some systems only allow + * binding memory to a single NUMA node). + * + * The most portable form that should be preferred over the others + * whenever possible is as follows: + * + * \code + * hwloc_alloc_membind_policy(topology, size, set, + * HWLOC_MEMBIND_DEFAULT, 0); + * \endcode + * + * This will allocate some memory hopefully bound to the specified set. + * To do so, hwloc will possibly have to change the current memory + * binding policy in order to actually get the memory bound, if the OS + * does not provide any other way to simply allocate bound memory + * without changing the policy for all allocations. That is the + * difference with hwloc_alloc_membind(), which will never change the + * current memory binding policy. Note that since HWLOC_MEMBIND_STRICT + * was not specified, failures to bind will not be reported -- + * generally, only memory allocation failures will be reported (e.g., + * even a plain malloc() would have failed with ENOMEM). + * + * Each hwloc memory binding function is available in two forms: one + * that takes a CPU set argument and another that takes a NUMA memory + * node set argument (see \ref hwlocality_object_sets and \ref + * hwlocality_bitmap for a discussion of CPU sets and NUMA memory node + * sets). The names of the latter form end with _nodeset. It is also + * possible to convert between CPU set and node set using + * hwloc_cpuset_to_nodeset() or hwloc_cpuset_from_nodeset(). + * + * \note On some operating systems, memory binding affects the CPU + * binding; see ::HWLOC_MEMBIND_NOCPUBIND + * @{ + */ + +/** \brief Memory binding policy. + * + * These constants can be used to choose the binding policy. Only one policy can + * be used at a time (i.e., the values cannot be OR'ed together). + * + * \note Not all systems support all kinds of binding. See the + * "Detailed Description" section of \ref hwlocality_membinding for a + * description of errors that can occur. + */ +typedef enum { + /** \brief Reset the memory allocation policy to the system default. + * \hideinitializer */ + HWLOC_MEMBIND_DEFAULT = 0, + + /** \brief Allocate memory + * but do not immediately bind it to a specific locality. Instead, + * each page in the allocation is bound only when it is first + * touched. Pages are individually bound to the local NUMA node of + * the first thread that touches it. If there is not enough memory + * on the node, allocation may be done in the specified cpuset + * before allocating on other nodes. + * \hideinitializer */ + HWLOC_MEMBIND_FIRSTTOUCH = 1, + + /** \brief Allocate memory on the specified nodes. + * \hideinitializer */ + HWLOC_MEMBIND_BIND = 2, + + /** \brief Allocate memory on the given nodes in an interleaved + * / round-robin manner. The precise layout of the memory across + * multiple NUMA nodes is OS/system specific. Interleaving can be + * useful when threads distributed across the specified NUMA nodes + * will all be accessing the whole memory range concurrently, since + * the interleave will then balance the memory references. + * \hideinitializer */ + HWLOC_MEMBIND_INTERLEAVE = 3, + + /** \brief Replicate memory on the given nodes; reads from this + * memory will attempt to be serviced from the NUMA node local to + * the reading thread. Replicating can be useful when multiple + * threads from the specified NUMA nodes will be sharing the same + * read-only data. + * + * This policy can only be used with existing memory allocations + * (i.e., the hwloc_set_*membind*() functions); it cannot be used + * with functions that allocate new memory (i.e., the hwloc_alloc*() + * functions). + * \hideinitializer */ + HWLOC_MEMBIND_REPLICATE = 4, + + /** \brief For each page bound with this policy, by next time + * it is touched (and next time only), it is moved from its current + * location to the local NUMA node of the thread where the memory + * reference occurred (if it needs to be moved at all). + * \hideinitializer */ + HWLOC_MEMBIND_NEXTTOUCH = 5, + + /** \brief Returned by hwloc_get_membind*() functions when multiple + * threads or parts of a memory area have differing memory binding + * policies. + * \hideinitializer */ + HWLOC_MEMBIND_MIXED = -1 +} hwloc_membind_policy_t; + +/** \brief Memory binding flags. + * + * These flags can be used to refine the binding policy. All flags + * can be logically OR'ed together with the exception of + * HWLOC_MEMBIND_PROCESS and HWLOC_MEMBIND_THREAD; these two flags are + * mutually exclusive. + * + * \note Not all systems support all kinds of binding. See the + * "Detailed Description" section of \ref hwlocality_membinding for a + * description of errors that can occur. + */ +typedef enum { + /** \brief Set policy for all threads of the specified (possibly + * multithreaded) process. This flag is mutually exclusive with + * HWLOC_MEMBIND_THREAD. + * \hideinitializer */ + HWLOC_MEMBIND_PROCESS = (1<<0), + + /** \brief Set policy for a specific thread of the current process. + * This flag is mutually exclusive with HWLOC_MEMBIND_PROCESS. + * \hideinitializer */ + HWLOC_MEMBIND_THREAD = (1<<1), + + /** Request strict binding from the OS. The function will fail if + * the binding can not be guaranteed / completely enforced. + * + * This flag has slightly different meanings depending on which + * function it is used with. + * \hideinitializer */ + HWLOC_MEMBIND_STRICT = (1<<2), + + /** \brief Migrate existing allocated memory. If the memory cannot + * be migrated and the HWLOC_MEMBIND_STRICT flag is passed, an error + * will be returned. + * \hideinitializer */ + HWLOC_MEMBIND_MIGRATE = (1<<3), + + /** \brief Avoid any effect on CPU binding. + * + * On some operating systems, some underlying memory binding + * functions also bind the application to the corresponding CPU(s). + * Using this flag will cause hwloc to avoid using OS functions that + * could potentially affect CPU bindings. Note, however, that using + * NOCPUBIND may reduce hwloc's overall memory binding + * support. Specifically: some of hwloc's memory binding functions + * may fail with errno set to ENOSYS when used with NOCPUBIND. + * \hideinitializer + */ + HWLOC_MEMBIND_NOCPUBIND = (1<<4) +} hwloc_membind_flags_t; + +/** \brief Set the default memory binding policy of the current + * process or thread to prefer the NUMA node(s) specified by physical \p nodeset + * + * If neither HWLOC_MEMBIND_PROCESS nor HWLOC_MEMBIND_THREAD is + * specified, the current process is assumed to be single-threaded. + * This is the most portable form as it permits hwloc to use either + * process-based OS functions or thread-based OS functions, depending + * on which are available. + * + * \return -1 with errno set to ENOSYS if the action is not supported + * \return -1 with errno set to EXDEV if the binding cannot be enforced + */ +HWLOC_DECLSPEC int hwloc_set_membind_nodeset(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags); + +/** \brief Set the default memory binding policy of the current + * process or thread to prefer the NUMA node(s) near the specified physical \p + * cpuset + * + * If neither HWLOC_MEMBIND_PROCESS nor HWLOC_MEMBIND_THREAD is + * specified, the current process is assumed to be single-threaded. + * This is the most portable form as it permits hwloc to use either + * process-based OS functions or thread-based OS functions, depending + * on which are available. + * + * \return -1 with errno set to ENOSYS if the action is not supported + * \return -1 with errno set to EXDEV if the binding cannot be enforced + */ +HWLOC_DECLSPEC int hwloc_set_membind(hwloc_topology_t topology, hwloc_const_cpuset_t cpuset, hwloc_membind_policy_t policy, int flags); + +/** \brief Query the default memory binding policy and physical locality of the + * current process or thread. + * + * This function has two output parameters: \p nodeset and \p policy. + * The values returned in these parameters depend on both the \p flags + * passed in and the current memory binding policies and nodesets in + * the queried target. + * + * Passing the HWLOC_MEMBIND_PROCESS flag specifies that the query + * target is the current policies and nodesets for all the threads in + * the current process. Passing HWLOC_MEMBIND_THREAD specifies that + * the query target is the current policy and nodeset for only the + * thread invoking this function. + * + * If neither of these flags are passed (which is the most portable + * method), the process is assumed to be single threaded. This allows + * hwloc to use either process-based OS functions or thread-based OS + * functions, depending on which are available. + * + * HWLOC_MEMBIND_STRICT is only meaningful when HWLOC_MEMBIND_PROCESS + * is also specified. In this case, hwloc will check the default + * memory policies and nodesets for all threads in the process. If + * they are not identical, -1 is returned and errno is set to EXDEV. + * If they are identical, the values are returned in \p nodeset and \p + * policy. + * + * Otherwise, if HWLOC_MEMBIND_PROCESS is specified (and + * HWLOC_MEMBIND_STRICT is \em not specified), \p nodeset is set to + * the logical OR of all threads' default nodeset. If all threads' + * default policies are the same, \p policy is set to that policy. If + * they are different, \p policy is set to HWLOC_MEMBIND_MIXED. + * + * In the HWLOC_MEMBIND_THREAD case (or when neither + * HWLOC_MEMBIND_PROCESS or HWLOC_MEMBIND_THREAD is specified), there + * is only one nodeset and policy; they are returned in \p nodeset and + * \p policy, respectively. + * + * If any other flags are specified, -1 is returned and errno is set + * to EINVAL. + */ +HWLOC_DECLSPEC int hwloc_get_membind_nodeset(hwloc_topology_t topology, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags); + +/** \brief Query the default memory binding policy and physical locality of the + * current process or thread (the locality is returned in \p cpuset as + * CPUs near the locality's actual NUMA node(s)). + * + * This function has two output parameters: \p cpuset and \p policy. + * The values returned in these parameters depend on both the \p flags + * passed in and the current memory binding policies and nodesets in + * the queried target. + * + * Passing the HWLOC_MEMBIND_PROCESS flag specifies that the query + * target is the current policies and nodesets for all the threads in + * the current process. Passing HWLOC_MEMBIND_THREAD specifies that + * the query target is the current policy and nodeset for only the + * thread invoking this function. + * + * If neither of these flags are passed (which is the most portable + * method), the process is assumed to be single threaded. This allows + * hwloc to use either process-based OS functions or thread-based OS + * functions, depending on which are available. + * + * HWLOC_MEMBIND_STRICT is only meaningful when HWLOC_MEMBIND_PROCESS + * is also specified. In this case, hwloc will check the default + * memory policies and nodesets for all threads in the process. If + * they are not identical, -1 is returned and errno is set to EXDEV. + * If they are identical, the policy is returned in \p policy. \p + * cpuset is set to the union of CPUs near the NUMA node(s) in the + * nodeset. + * + * Otherwise, if HWLOC_MEMBIND_PROCESS is specified (and + * HWLOC_MEMBIND_STRICT is \em not specified), the default nodeset + * from each thread is logically OR'ed together. \p cpuset is set to + * the union of CPUs near the NUMA node(s) in the resulting nodeset. + * If all threads' default policies are the same, \p policy is set to + * that policy. If they are different, \p policy is set to + * HWLOC_MEMBIND_MIXED. + * + * In the HWLOC_MEMBIND_THREAD case (or when neither + * HWLOC_MEMBIND_PROCESS or HWLOC_MEMBIND_THREAD is specified), there + * is only one nodeset and policy. The policy is returned in \p + * policy; \p cpuset is set to the union of CPUs near the NUMA node(s) + * in the \p nodeset. + * + * If any other flags are specified, -1 is returned and errno is set + * to EINVAL. + */ +HWLOC_DECLSPEC int hwloc_get_membind(hwloc_topology_t topology, hwloc_cpuset_t cpuset, hwloc_membind_policy_t * policy, int flags); + +/** \brief Set the default memory binding policy of the specified + * process to prefer the NUMA node(s) specified by physical \p nodeset + * + * \return -1 with errno set to ENOSYS if the action is not supported + * \return -1 with errno set to EXDEV if the binding cannot be enforced + * + * \note \p hwloc_pid_t is \p pid_t on Unix platforms, + * and \p HANDLE on native Windows platforms. + */ +HWLOC_DECLSPEC int hwloc_set_proc_membind_nodeset(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags); + +/** \brief Set the default memory binding policy of the specified + * process to prefer the NUMA node(s) near the specified physical \p cpuset + * + * \return -1 with errno set to ENOSYS if the action is not supported + * \return -1 with errno set to EXDEV if the binding cannot be enforced + * + * \note \p hwloc_pid_t is \p pid_t on Unix platforms, + * and \p HANDLE on native Windows platforms. + */ +HWLOC_DECLSPEC int hwloc_set_proc_membind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_cpuset_t cpuset, hwloc_membind_policy_t policy, int flags); + +/** \brief Query the default memory binding policy and physical locality of the + * specified process. + * + * This function has two output parameters: \p nodeset and \p policy. + * The values returned in these parameters depend on both the \p flags + * passed in and the current memory binding policies and nodesets in + * the queried target. + * + * Passing the HWLOC_MEMBIND_PROCESS flag specifies that the query + * target is the current policies and nodesets for all the threads in + * the specified process. If HWLOC_MEMBIND_PROCESS is not specified + * (which is the most portable method), the process is assumed to be + * single threaded. This allows hwloc to use either process-based OS + * functions or thread-based OS functions, depending on which are + * available. + * + * Note that it does not make sense to pass HWLOC_MEMBIND_THREAD to + * this function. + * + * If HWLOC_MEMBIND_STRICT is specified, hwloc will check the default + * memory policies and nodesets for all threads in the specified + * process. If they are not identical, -1 is returned and errno is + * set to EXDEV. If they are identical, the values are returned in \p + * nodeset and \p policy. + * + * Otherwise, \p nodeset is set to the logical OR of all threads' + * default nodeset. If all threads' default policies are the same, \p + * policy is set to that policy. If they are different, \p policy is + * set to HWLOC_MEMBIND_MIXED. + * + * If any other flags are specified, -1 is returned and errno is set + * to EINVAL. + * + * \note \p hwloc_pid_t is \p pid_t on Unix platforms, + * and \p HANDLE on native Windows platforms. + */ +HWLOC_DECLSPEC int hwloc_get_proc_membind_nodeset(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags); + +/** \brief Query the default memory binding policy and physical locality of the + * specified process (the locality is returned in \p cpuset as CPUs + * near the locality's actual NUMA node(s)). + * + * This function has two output parameters: \p cpuset and \p policy. + * The values returned in these parameters depend on both the \p flags + * passed in and the current memory binding policies and nodesets in + * the queried target. + * + * Passing the HWLOC_MEMBIND_PROCESS flag specifies that the query + * target is the current policies and nodesets for all the threads in + * the specified process. If HWLOC_MEMBIND_PROCESS is not specified + * (which is the most portable method), the process is assumed to be + * single threaded. This allows hwloc to use either process-based OS + * functions or thread-based OS functions, depending on which are + * available. + * + * Note that it does not make sense to pass HWLOC_MEMBIND_THREAD to + * this function. + * + * If HWLOC_MEMBIND_STRICT is specified, hwloc will check the default + * memory policies and nodesets for all threads in the specified + * process. If they are not identical, -1 is returned and errno is + * set to EXDEV. If they are identical, the policy is returned in \p + * policy. \p cpuset is set to the union of CPUs near the NUMA + * node(s) in the nodeset. + * + * Otherwise, the default nodeset from each thread is logically OR'ed + * together. \p cpuset is set to the union of CPUs near the NUMA + * node(s) in the resulting nodeset. If all threads' default policies + * are the same, \p policy is set to that policy. If they are + * different, \p policy is set to HWLOC_MEMBIND_MIXED. + * + * If any other flags are specified, -1 is returned and errno is set + * to EINVAL. + * + * \note \p hwloc_pid_t is \p pid_t on Unix platforms, + * and \p HANDLE on native Windows platforms. + */ +HWLOC_DECLSPEC int hwloc_get_proc_membind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_cpuset_t cpuset, hwloc_membind_policy_t * policy, int flags); + +/** \brief Bind the already-allocated memory identified by (addr, len) + * to the NUMA node(s) in physical \p nodeset. + * + * \return -1 with errno set to ENOSYS if the action is not supported + * \return -1 with errno set to EXDEV if the binding cannot be enforced + */ +HWLOC_DECLSPEC int hwloc_set_area_membind_nodeset(hwloc_topology_t topology, const void *addr, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags); + +/** \brief Bind the already-allocated memory identified by (addr, len) + * to the NUMA node(s) near physical \p cpuset. + * + * \return -1 with errno set to ENOSYS if the action is not supported + * \return -1 with errno set to EXDEV if the binding cannot be enforced + */ +HWLOC_DECLSPEC int hwloc_set_area_membind(hwloc_topology_t topology, const void *addr, size_t len, hwloc_const_cpuset_t cpuset, hwloc_membind_policy_t policy, int flags); + +/** \brief Query the physical NUMA node(s) and binding policy of the memory + * identified by (\p addr, \p len ). + * + * This function has two output parameters: \p nodeset and \p policy. + * The values returned in these parameters depend on both the \p flags + * passed in and the memory binding policies and nodesets of the pages + * in the address range. + * + * If HWLOC_MEMBIND_STRICT is specified, the target pages are first + * checked to see if they all have the same memory binding policy and + * nodeset. If they do not, -1 is returned and errno is set to EXDEV. + * If they are identical across all pages, the nodeset and policy are + * returned in \p nodeset and \p policy, respectively. + * + * If HWLOC_MEMBIND_STRICT is not specified, \p nodeset is set to the + * union of all NUMA node(s) containing pages in the address range. + * If all pages in the target have the same policy, it is returned in + * \p policy. Otherwise, \p policy is set to HWLOC_MEMBIND_MIXED. + * + * If any other flags are specified, -1 is returned and errno is set + * to EINVAL. + */ +HWLOC_DECLSPEC int hwloc_get_area_membind_nodeset(hwloc_topology_t topology, const void *addr, size_t len, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags); + +/** \brief Query the CPUs near the physical NUMA node(s) and binding policy of + * the memory identified by (\p addr, \p len ). + * + * This function has two output parameters: \p cpuset and \p policy. + * The values returned in these parameters depend on both the \p flags + * passed in and the memory binding policies and nodesets of the pages + * in the address range. + * + * If HWLOC_MEMBIND_STRICT is specified, the target pages are first + * checked to see if they all have the same memory binding policy and + * nodeset. If they do not, -1 is returned and errno is set to EXDEV. + * If they are identical across all pages, the policy is returned in + * \p policy. \p cpuset is set to the union of CPUs near the NUMA + * node(s) in the nodeset. + * + * If HWLOC_MEMBIND_STRICT is not specified, the union of all NUMA + * node(s) containing pages in the address range is calculated. \p + * cpuset is then set to the CPUs near the NUMA node(s) in this union. + * If all pages in the target have the same policy, it is returned in + * \p policy. Otherwise, \p policy is set to HWLOC_MEMBIND_MIXED. + * + * If any other flags are specified, -1 is returned and errno is set + * to EINVAL. + */ +HWLOC_DECLSPEC int hwloc_get_area_membind(hwloc_topology_t topology, const void *addr, size_t len, hwloc_cpuset_t cpuset, hwloc_membind_policy_t * policy, int flags); + +/** \brief Allocate some memory + * + * This is equivalent to malloc(), except that it tries to allocate + * page-aligned memory from the OS. + * + * \note The allocated memory should be freed with hwloc_free(). + */ +HWLOC_DECLSPEC void *hwloc_alloc(hwloc_topology_t topology, size_t len); + +/** \brief Allocate some memory on the given physical nodeset \p nodeset + * + * \return NULL with errno set to ENOSYS if the action is not supported + * and HWLOC_MEMBIND_STRICT is given + * \return NULL with errno set to EXDEV if the binding cannot be enforced + * and HWLOC_MEMBIND_STRICT is given + * + * \note The allocated memory should be freed with hwloc_free(). + */ +HWLOC_DECLSPEC void *hwloc_alloc_membind_nodeset(hwloc_topology_t topology, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) __hwloc_attribute_malloc; + +/** \brief Allocate some memory on memory nodes near the given physical cpuset \p cpuset + * + * \return NULL with errno set to ENOSYS if the action is not supported + * and HWLOC_MEMBIND_STRICT is given + * \return NULL with errno set to EXDEV if the binding cannot be enforced + * and HWLOC_MEMBIND_STRICT is given + * + * \note The allocated memory should be freed with hwloc_free(). + */ +HWLOC_DECLSPEC void *hwloc_alloc_membind(hwloc_topology_t topology, size_t len, hwloc_const_cpuset_t cpuset, hwloc_membind_policy_t policy, int flags) __hwloc_attribute_malloc; + +/** \brief Allocate some memory on the given nodeset \p nodeset + * + * This is similar to hwloc_alloc_membind except that it is allowed to change + * the current memory binding policy, thus providing more binding support, at + * the expense of changing the current state. + */ +static __hwloc_inline void * +hwloc_alloc_membind_policy_nodeset(hwloc_topology_t topology, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) __hwloc_attribute_malloc; + +/** \brief Allocate some memory on the memory nodes near given cpuset \p cpuset + * + * This is similar to hwloc_alloc_membind_policy_nodeset, but for a given cpuset. + */ +static __hwloc_inline void * +hwloc_alloc_membind_policy(hwloc_topology_t topology, size_t len, hwloc_const_cpuset_t set, hwloc_membind_policy_t policy, int flags) __hwloc_attribute_malloc; + +/** \brief Free memory that was previously allocated by hwloc_alloc() + * or hwloc_alloc_membind(). + */ +HWLOC_DECLSPEC int hwloc_free(hwloc_topology_t topology, void *addr, size_t len); + +/** @} */ + + + +/** \defgroup hwlocality_tinker Modifying a loaded Topology + * @{ + */ + +/** \brief Add a MISC object to the topology + * + * A new MISC object will be created and inserted into the topology at the + * position given by bitmap \p cpuset. This offers a way to add new + * intermediate levels to the topology hierarchy. + * + * \p cpuset and \p name will be copied to setup the new object attributes. + * + * \return the newly-created object. + * \return \c NULL if the insertion conflicts with the existing topology tree. + * + * \note If \p name contains some non-printable characters, they will + * be dropped when exporting to XML, see hwloc_topology_export_xml(). + */ +HWLOC_DECLSPEC hwloc_obj_t hwloc_topology_insert_misc_object_by_cpuset(hwloc_topology_t topology, hwloc_const_cpuset_t cpuset, const char *name); + +/** \brief Add a MISC object as a leaf of the topology + * + * A new MISC object will be created and inserted into the topology at the + * position given by parent. It is appended to the list of existing children, + * without ever adding any intermediate hierarchy level. This is useful for + * annotating the topology without actually changing the hierarchy. + * + * \p name will be copied to the setup the new object attributes. + * However, the new leaf object will not have any \p cpuset. + * + * \return the newly-created object + * + * \note If \p name contains some non-printable characters, they will + * be dropped when exporting to XML, see hwloc_topology_export_xml(). + */ +HWLOC_DECLSPEC hwloc_obj_t hwloc_topology_insert_misc_object_by_parent(hwloc_topology_t topology, hwloc_obj_t parent, const char *name); + +/** \brief Flags to be given to hwloc_topology_restrict(). */ +enum hwloc_restrict_flags_e { + /** \brief Adapt distance matrices according to objects being removed during restriction. + * If this flag is not set, distance matrices are removed. + * \hideinitializer + */ + HWLOC_RESTRICT_FLAG_ADAPT_DISTANCES = (1<<0), + + /** \brief Move Misc objects to ancestors if their parents are removed during restriction. + * If this flag is not set, Misc objects are removed when their parents are removed. + * \hideinitializer + */ + HWLOC_RESTRICT_FLAG_ADAPT_MISC = (1<<1), + + /** \brief Move I/O objects to ancestors if their parents are removed during restriction. + * If this flag is not set, I/O devices and bridges are removed when their parents are removed. + * \hideinitializer + */ + HWLOC_RESTRICT_FLAG_ADAPT_IO = (1<<2) +}; + +/** \brief Restrict the topology to the given CPU set. + * + * Topology \p topology is modified so as to remove all objects that + * are not included (or partially included) in the CPU set \p cpuset. + * All objects CPU and node sets are restricted accordingly. + * + * \p flags is a OR'ed set of ::hwloc_restrict_flags_e. + * + * \note This call may not be reverted by restricting back to a larger + * cpuset. Once dropped during restriction, objects may not be brought + * back, except by loading another topology with hwloc_topology_load(). + * + * \return 0 on success. + * + * \return -1 with errno set to EINVAL if the input cpuset is invalid. + * The topology is not modified in this case. + * + * \return -1 with errno set to ENOMEM on failure to allocate internal data. + * The topology is reinitialized in this case. It should be either + * destroyed with hwloc_topology_destroy() or configured and loaded again. + */ +HWLOC_DECLSPEC int hwloc_topology_restrict(hwloc_topology_t __hwloc_restrict topology, hwloc_const_cpuset_t cpuset, unsigned long flags); + +/** \brief Duplicate a topology. + * + * The entire topology structure as well as its objects + * are duplicated into a new one. + * + * This is useful for keeping a backup while modifying a topology. + */ +HWLOC_DECLSPEC int hwloc_topology_dup(hwloc_topology_t *newtopology, hwloc_topology_t oldtopology); + +/** @} */ + + + +/** \defgroup hwlocality_custom Building Custom Topologies + * + * A custom topology may be initialized by calling hwloc_topology_set_custom() + * after hwloc_topology_init(). It may then be modified by inserting objects + * or entire topologies. Once done assembling, hwloc_topology_load() should + * be invoked as usual to finalize the topology. + * @{ + */ + +/** \brief Insert an existing topology inside a custom topology + * + * Duplicate the existing topology \p oldtopology inside a new + * custom topology \p newtopology as a leaf of object \p newparent. + * + * If \p oldroot is not \c NULL, duplicate \p oldroot and all its + * children instead of the entire \p oldtopology. Passing the root + * object of \p oldtopology in \p oldroot is equivalent to passing + * \c NULL. + * + * The custom topology \p newtopology must have been prepared with + * hwloc_topology_set_custom() and not loaded with hwloc_topology_load() + * yet. + * + * \p newparent may be either the root of \p newtopology or an object + * that was added through hwloc_custom_insert_group_object_by_parent(). + * + * \note The cpuset and nodeset of the \p newparent object are not + * modified based on the contents of \p oldtopology. + */ +HWLOC_DECLSPEC int hwloc_custom_insert_topology(hwloc_topology_t newtopology, hwloc_obj_t newparent, hwloc_topology_t oldtopology, hwloc_obj_t oldroot); + +/** \brief Insert a new group object inside a custom topology + * + * An object with type ::HWLOC_OBJ_GROUP is inserted as a new child + * of object \p parent. + * + * \p groupdepth is the depth attribute to be given to the new object. + * It may for instance be 0 for top-level groups, 1 for their children, + * and so on. + * + * The custom topology \p newtopology must have been prepared with + * hwloc_topology_set_custom() and not loaded with hwloc_topology_load() + * yet. + * + * \p parent may be either the root of \p topology or an object that + * was added earlier through hwloc_custom_insert_group_object_by_parent(). + * + * \note The cpuset and nodeset of the new group object are NULL because + * these sets are meaningless when assembling multiple topologies. + * + * \note The cpuset and nodeset of the \p parent object are not modified. + */ +HWLOC_DECLSPEC hwloc_obj_t hwloc_custom_insert_group_object_by_parent(hwloc_topology_t topology, hwloc_obj_t parent, int groupdepth); + +/** @} */ + + + +/** \defgroup hwlocality_xmlexport Exporting Topologies to XML + * @{ + */ + +/** \brief Export the topology into an XML file. + * + * This file may be loaded later through hwloc_topology_set_xml(). + * + * \return -1 if a failure occured. + * + * \note See also hwloc_topology_set_userdata_export_callback() + * for exporting application-specific userdata. + * + * \note Only printable characters may be exported to XML string attributes. + * Any other character, especially any non-ASCII character, will be silently + * dropped. + */ +HWLOC_DECLSPEC int hwloc_topology_export_xml(hwloc_topology_t topology, const char *xmlpath); + +/** \brief Export the topology into a newly-allocated XML memory buffer. + * + * \p xmlbuffer is allocated by the callee and should be freed with + * hwloc_free_xmlbuffer() later in the caller. + * + * This memory buffer may be loaded later through hwloc_topology_set_xmlbuffer(). + * + * \return -1 if a failure occured. + * + * \note See also hwloc_topology_set_userdata_export_callback() + * for exporting application-specific userdata. + * + * \note Only printable characters may be exported to XML string attributes. + * Any other character, especially any non-ASCII character, will be silently + * dropped. + */ +HWLOC_DECLSPEC int hwloc_topology_export_xmlbuffer(hwloc_topology_t topology, char **xmlbuffer, int *buflen); + +/** \brief Free a buffer allocated by hwloc_topology_export_xmlbuffer() */ +HWLOC_DECLSPEC void hwloc_free_xmlbuffer(hwloc_topology_t topology, char *xmlbuffer); + +/** \brief Set the application-specific callback for exporting userdata + * + * The object userdata pointer is not exported to XML by default because hwloc + * does not know what it contains. + * + * This function lets applications set \p export_cb to a callback function + * that converts this opaque userdata into an exportable string. + * + * \p export_cb is invoked during XML export for each object whose + * \p userdata pointer is not \c NULL. + * The callback should use hwloc_export_obj_userdata() or + * hwloc_export_obj_userdata_base64() to actually export + * something to XML (possibly multiple times per object). + * + * \p export_cb may be set to \c NULL if userdata should not be exported to XML. + */ +HWLOC_DECLSPEC void hwloc_topology_set_userdata_export_callback(hwloc_topology_t topology, + void (*export_cb)(void *reserved, hwloc_topology_t topology, hwloc_obj_t obj)); + +/** \brief Export some object userdata to XML + * + * This function may only be called from within the export() callback passed + * to hwloc_topology_set_userdata_export_callback(). + * It may be invoked one of multiple times to export some userdata to XML. + * The \p buffer content of length \p length is stored with optional name + * \p name. + * + * When importing this XML file, the import() callback (if set) will be + * called exactly as many times as hwloc_export_obj_userdata() was called + * during export(). It will receive the corresponding \p name, \p buffer + * and \p length arguments. + * + * \p reserved, \p topology and \p obj must be the first three parameters + * that were given to the export callback. + * + * Only printable characters may be exported to XML string attributes. + * If a non-printable character is passed in \p name or \p buffer, + * the function returns -1 with errno set to EINVAL. + * + * If exporting binary data, the application should first encode into + * printable characters only (or use hwloc_export_obj_userdata_base64()). + * It should also take care of portability issues if the export may + * be reimported on a different architecture. + */ +HWLOC_DECLSPEC int hwloc_export_obj_userdata(void *reserved, hwloc_topology_t topology, hwloc_obj_t obj, const char *name, const void *buffer, size_t length); + +/** \brief Encode and export some object userdata to XML + * + * This function is similar to hwloc_export_obj_userdata() but it encodes + * the input buffer into printable characters before exporting. + * On import, decoding is automatically performed before the data is given + * to the import() callback if any. + * + * This function may only be called from within the export() callback passed + * to hwloc_topology_set_userdata_export_callback(). + * + * The function does not take care of portability issues if the export + * may be reimported on a different architecture. + */ +HWLOC_DECLSPEC int hwloc_export_obj_userdata_base64(void *reserved, hwloc_topology_t topology, hwloc_obj_t obj, const char *name, const void *buffer, size_t length); + +/** \brief Set the application-specific callback for importing userdata + * + * On XML import, userdata is ignored by default because hwloc does not know + * how to store it in memory. + * + * This function lets applications set \p import_cb to a callback function + * that will get the XML-stored userdata and store it in the object as expected + * by the application. + * + * \p import_cb is called during hwloc_topology_load() as many times as + * hwloc_export_obj_userdata() was called during export. The topology + * is not entirely setup yet. Object attributes are ready to consult, + * but links between objects are not. + * + * \p import_cb may be \c NULL if userdata should be ignored during import. + * + * \note \p buffer contains \p length characters followed by a null byte ('\0'). + * + * \note This function should be called before hwloc_topology_load(). + */ +HWLOC_DECLSPEC void hwloc_topology_set_userdata_import_callback(hwloc_topology_t topology, + void (*import_cb)(hwloc_topology_t topology, hwloc_obj_t obj, const char *name, const void *buffer, size_t length)); + +/** @} */ + + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +/* high-level helpers */ +#include + +/* inline code of some functions above */ +#include + +/* topology diffs */ +#include + +/* deprecated headers */ +#include + +#endif /* HWLOC_H */ diff --git a/ext/hwloc/include/hwloc/autogen/config.h b/ext/hwloc/include/hwloc/autogen/config.h new file mode 100644 index 000000000..06f5d365e --- /dev/null +++ b/ext/hwloc/include/hwloc/autogen/config.h @@ -0,0 +1,191 @@ +/* include/hwloc/autogen/config.h. Generated from config.h.in by configure. */ +/* -*- c -*- + * Copyright © 2009 CNRS + * Copyright © 2009-2010 inria. All rights reserved. + * Copyright © 2009-2012 Université Bordeaux 1 + * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +/* The configuration file */ + +#ifndef HWLOC_CONFIG_H +#define HWLOC_CONFIG_H + +#if (__GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 95)) +# define __hwloc_restrict __restrict +#else +# if __STDC_VERSION__ >= 199901L +# define __hwloc_restrict restrict +# else +# define __hwloc_restrict +# endif +#endif + +/* Note that if we're compiling C++, then just use the "inline" + keyword, since it's part of C++ */ +#if defined(c_plusplus) || defined(__cplusplus) +# define __hwloc_inline inline +#elif defined(_MSC_VER) || defined(__HP_cc) +# define __hwloc_inline __inline +#else +# define __hwloc_inline __inline__ +#endif + +/* + * Note: this is public. We can not assume anything from the compiler used + * by the application and thus the HWLOC_HAVE_* macros below are not + * fetched from the autoconf result here. We only automatically use a few + * well-known easy cases. + */ + +/* Some handy constants to make the logic below a little more readable */ +#if defined(__cplusplus) && \ + (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR >= 4)) +#define GXX_ABOVE_3_4 1 +#else +#define GXX_ABOVE_3_4 0 +#endif + +#if !defined(__cplusplus) && \ + (__GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 95)) +#define GCC_ABOVE_2_95 1 +#else +#define GCC_ABOVE_2_95 0 +#endif + +#if !defined(__cplusplus) && \ + (__GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 96)) +#define GCC_ABOVE_2_96 1 +#else +#define GCC_ABOVE_2_96 0 +#endif + +#if !defined(__cplusplus) && \ + (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 3)) +#define GCC_ABOVE_3_3 1 +#else +#define GCC_ABOVE_3_3 0 +#endif + +/* Maybe before gcc 2.95 too */ +#ifdef HWLOC_HAVE_ATTRIBUTE_UNUSED +#define __HWLOC_HAVE_ATTRIBUTE_UNUSED HWLOC_HAVE_ATTRIBUTE_UNUSED +#elif defined(__GNUC__) +# define __HWLOC_HAVE_ATTRIBUTE_UNUSED (GXX_ABOVE_3_4 || GCC_ABOVE_2_95) +#else +# define __HWLOC_HAVE_ATTRIBUTE_UNUSED 0 +#endif +#if __HWLOC_HAVE_ATTRIBUTE_UNUSED +# define __hwloc_attribute_unused __attribute__((__unused__)) +#else +# define __hwloc_attribute_unused +#endif + +#ifdef HWLOC_HAVE_ATTRIBUTE_MALLOC +#define __HWLOC_HAVE_ATTRIBUTE_MALLOC HWLOC_HAVE_ATTRIBUTE_MALLOC +#elif defined(__GNUC__) +# define __HWLOC_HAVE_ATTRIBUTE_MALLOC (GXX_ABOVE_3_4 || GCC_ABOVE_2_96) +#else +# define __HWLOC_HAVE_ATTRIBUTE_MALLOC 0 +#endif +#if __HWLOC_HAVE_ATTRIBUTE_MALLOC +# define __hwloc_attribute_malloc __attribute__((__malloc__)) +#else +# define __hwloc_attribute_malloc +#endif + +#ifdef HWLOC_HAVE_ATTRIBUTE_CONST +#define __HWLOC_HAVE_ATTRIBUTE_CONST HWLOC_HAVE_ATTRIBUTE_CONST +#elif defined(__GNUC__) +# define __HWLOC_HAVE_ATTRIBUTE_CONST (GXX_ABOVE_3_4 || GCC_ABOVE_2_95) +#else +# define __HWLOC_HAVE_ATTRIBUTE_CONST 0 +#endif +#if __HWLOC_HAVE_ATTRIBUTE_CONST +# define __hwloc_attribute_const __attribute__((__const__)) +#else +# define __hwloc_attribute_const +#endif + +#ifdef HWLOC_HAVE_ATTRIBUTE_PURE +#define __HWLOC_HAVE_ATTRIBUTE_PURE HWLOC_HAVE_ATTRIBUTE_PURE +#elif defined(__GNUC__) +# define __HWLOC_HAVE_ATTRIBUTE_PURE (GXX_ABOVE_3_4 || GCC_ABOVE_2_96) +#else +# define __HWLOC_HAVE_ATTRIBUTE_PURE 0 +#endif +#if __HWLOC_HAVE_ATTRIBUTE_PURE +# define __hwloc_attribute_pure __attribute__((__pure__)) +#else +# define __hwloc_attribute_pure +#endif + +#ifdef HWLOC_HAVE_ATTRIBUTE_DEPRECATED +#define __HWLOC_HAVE_ATTRIBUTE_DEPRECATED HWLOC_HAVE_ATTRIBUTE_DEPRECATED +#elif defined(__GNUC__) +# define __HWLOC_HAVE_ATTRIBUTE_DEPRECATED (GXX_ABOVE_3_4 || GCC_ABOVE_3_3) +#else +# define __HWLOC_HAVE_ATTRIBUTE_DEPRECATED 0 +#endif +#if __HWLOC_HAVE_ATTRIBUTE_DEPRECATED +# define __hwloc_attribute_deprecated __attribute__((__deprecated__)) +#else +# define __hwloc_attribute_deprecated +#endif + +#ifdef HWLOC_C_HAVE_VISIBILITY +# if HWLOC_C_HAVE_VISIBILITY +# define HWLOC_DECLSPEC __attribute__((__visibility__("default"))) +# else +# define HWLOC_DECLSPEC +# endif +#else +# define HWLOC_DECLSPEC +#endif + +/* Defined to 1 on Linux */ +#define HWLOC_LINUX_SYS 1 + +/* Defined to 1 if the CPU_SET macro works */ +#define HWLOC_HAVE_CPU_SET 1 + +/* Defined to 1 if you have the `windows.h' header. */ +/* #undef HWLOC_HAVE_WINDOWS_H */ +#define hwloc_pid_t pid_t +#define hwloc_thread_t pthread_t + +#ifdef HWLOC_HAVE_WINDOWS_H + +# include +typedef DWORDLONG hwloc_uint64_t; + +#else /* HWLOC_HAVE_WINDOWS_H */ + +# ifdef hwloc_thread_t +# include +# endif /* hwloc_thread_t */ + +/* Defined to 1 if you have the header file. */ +# define HWLOC_HAVE_STDINT_H 1 + +# include +# ifdef HWLOC_HAVE_STDINT_H +# include +# endif +typedef uint64_t hwloc_uint64_t; + +#endif /* HWLOC_HAVE_WINDOWS_H */ + +/* Whether we need to re-define all the hwloc public symbols or not */ +#define HWLOC_SYM_TRANSFORM 0 + +/* The hwloc symbol prefix */ +#define HWLOC_SYM_PREFIX hwloc_ + +#define HWLOC_HAVE_PCIUTILS 1 + +/* The hwloc symbol prefix in all caps */ +#define HWLOC_SYM_PREFIX_CAPS HWLOC_ + +#endif /* HWLOC_CONFIG_H */ diff --git a/ext/hwloc/include/hwloc/autogen/config.h.in b/ext/hwloc/include/hwloc/autogen/config.h.in new file mode 100644 index 000000000..a30af0c2d --- /dev/null +++ b/ext/hwloc/include/hwloc/autogen/config.h.in @@ -0,0 +1,188 @@ +/* -*- c -*- + * Copyright © 2009 CNRS + * Copyright © 2009-2010 inria. All rights reserved. + * Copyright © 2009-2012 Université Bordeaux 1 + * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +/* The configuration file */ + +#ifndef HWLOC_CONFIG_H +#define HWLOC_CONFIG_H + +#if (__GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 95)) +# define __hwloc_restrict __restrict +#else +# if __STDC_VERSION__ >= 199901L +# define __hwloc_restrict restrict +# else +# define __hwloc_restrict +# endif +#endif + +/* Note that if we're compiling C++, then just use the "inline" + keyword, since it's part of C++ */ +#if defined(c_plusplus) || defined(__cplusplus) +# define __hwloc_inline inline +#elif defined(_MSC_VER) || defined(__HP_cc) +# define __hwloc_inline __inline +#else +# define __hwloc_inline __inline__ +#endif + +/* + * Note: this is public. We can not assume anything from the compiler used + * by the application and thus the HWLOC_HAVE_* macros below are not + * fetched from the autoconf result here. We only automatically use a few + * well-known easy cases. + */ + +/* Some handy constants to make the logic below a little more readable */ +#if defined(__cplusplus) && \ + (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR >= 4)) +#define GXX_ABOVE_3_4 1 +#else +#define GXX_ABOVE_3_4 0 +#endif + +#if !defined(__cplusplus) && \ + (__GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 95)) +#define GCC_ABOVE_2_95 1 +#else +#define GCC_ABOVE_2_95 0 +#endif + +#if !defined(__cplusplus) && \ + (__GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 96)) +#define GCC_ABOVE_2_96 1 +#else +#define GCC_ABOVE_2_96 0 +#endif + +#if !defined(__cplusplus) && \ + (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 3)) +#define GCC_ABOVE_3_3 1 +#else +#define GCC_ABOVE_3_3 0 +#endif + +/* Maybe before gcc 2.95 too */ +#ifdef HWLOC_HAVE_ATTRIBUTE_UNUSED +#define __HWLOC_HAVE_ATTRIBUTE_UNUSED HWLOC_HAVE_ATTRIBUTE_UNUSED +#elif defined(__GNUC__) +# define __HWLOC_HAVE_ATTRIBUTE_UNUSED (GXX_ABOVE_3_4 || GCC_ABOVE_2_95) +#else +# define __HWLOC_HAVE_ATTRIBUTE_UNUSED 0 +#endif +#if __HWLOC_HAVE_ATTRIBUTE_UNUSED +# define __hwloc_attribute_unused __attribute__((__unused__)) +#else +# define __hwloc_attribute_unused +#endif + +#ifdef HWLOC_HAVE_ATTRIBUTE_MALLOC +#define __HWLOC_HAVE_ATTRIBUTE_MALLOC HWLOC_HAVE_ATTRIBUTE_MALLOC +#elif defined(__GNUC__) +# define __HWLOC_HAVE_ATTRIBUTE_MALLOC (GXX_ABOVE_3_4 || GCC_ABOVE_2_96) +#else +# define __HWLOC_HAVE_ATTRIBUTE_MALLOC 0 +#endif +#if __HWLOC_HAVE_ATTRIBUTE_MALLOC +# define __hwloc_attribute_malloc __attribute__((__malloc__)) +#else +# define __hwloc_attribute_malloc +#endif + +#ifdef HWLOC_HAVE_ATTRIBUTE_CONST +#define __HWLOC_HAVE_ATTRIBUTE_CONST HWLOC_HAVE_ATTRIBUTE_CONST +#elif defined(__GNUC__) +# define __HWLOC_HAVE_ATTRIBUTE_CONST (GXX_ABOVE_3_4 || GCC_ABOVE_2_95) +#else +# define __HWLOC_HAVE_ATTRIBUTE_CONST 0 +#endif +#if __HWLOC_HAVE_ATTRIBUTE_CONST +# define __hwloc_attribute_const __attribute__((__const__)) +#else +# define __hwloc_attribute_const +#endif + +#ifdef HWLOC_HAVE_ATTRIBUTE_PURE +#define __HWLOC_HAVE_ATTRIBUTE_PURE HWLOC_HAVE_ATTRIBUTE_PURE +#elif defined(__GNUC__) +# define __HWLOC_HAVE_ATTRIBUTE_PURE (GXX_ABOVE_3_4 || GCC_ABOVE_2_96) +#else +# define __HWLOC_HAVE_ATTRIBUTE_PURE 0 +#endif +#if __HWLOC_HAVE_ATTRIBUTE_PURE +# define __hwloc_attribute_pure __attribute__((__pure__)) +#else +# define __hwloc_attribute_pure +#endif + +#ifdef HWLOC_HAVE_ATTRIBUTE_DEPRECATED +#define __HWLOC_HAVE_ATTRIBUTE_DEPRECATED HWLOC_HAVE_ATTRIBUTE_DEPRECATED +#elif defined(__GNUC__) +# define __HWLOC_HAVE_ATTRIBUTE_DEPRECATED (GXX_ABOVE_3_4 || GCC_ABOVE_3_3) +#else +# define __HWLOC_HAVE_ATTRIBUTE_DEPRECATED 0 +#endif +#if __HWLOC_HAVE_ATTRIBUTE_DEPRECATED +# define __hwloc_attribute_deprecated __attribute__((__deprecated__)) +#else +# define __hwloc_attribute_deprecated +#endif + +#ifdef HWLOC_C_HAVE_VISIBILITY +# if HWLOC_C_HAVE_VISIBILITY +# define HWLOC_DECLSPEC __attribute__((__visibility__("default"))) +# else +# define HWLOC_DECLSPEC +# endif +#else +# define HWLOC_DECLSPEC +#endif + +/* Defined to 1 on Linux */ +#undef HWLOC_LINUX_SYS + +/* Defined to 1 if the CPU_SET macro works */ +#undef HWLOC_HAVE_CPU_SET + +/* Defined to 1 if you have the `windows.h' header. */ +#undef HWLOC_HAVE_WINDOWS_H +#undef hwloc_pid_t +#undef hwloc_thread_t + +#ifdef HWLOC_HAVE_WINDOWS_H + +# include +typedef DWORDLONG hwloc_uint64_t; + +#else /* HWLOC_HAVE_WINDOWS_H */ + +# ifdef hwloc_thread_t +# include +# endif /* hwloc_thread_t */ + +/* Defined to 1 if you have the header file. */ +# undef HWLOC_HAVE_STDINT_H + +# include +# ifdef HWLOC_HAVE_STDINT_H +# include +# endif +typedef uint64_t hwloc_uint64_t; + +#endif /* HWLOC_HAVE_WINDOWS_H */ + +/* Whether we need to re-define all the hwloc public symbols or not */ +#undef HWLOC_SYM_TRANSFORM + +/* The hwloc symbol prefix */ +#undef HWLOC_SYM_PREFIX + +/* The hwloc symbol prefix in all caps */ +#undef HWLOC_SYM_PREFIX_CAPS + +#endif /* HWLOC_CONFIG_H */ diff --git a/ext/hwloc/include/hwloc/autogen/stamp-h2 b/ext/hwloc/include/hwloc/autogen/stamp-h2 new file mode 100644 index 000000000..804e0acce --- /dev/null +++ b/ext/hwloc/include/hwloc/autogen/stamp-h2 @@ -0,0 +1 @@ +timestamp for include/hwloc/autogen/config.h diff --git a/ext/hwloc/include/hwloc/bitmap.h b/ext/hwloc/include/hwloc/bitmap.h new file mode 100644 index 000000000..adf589b84 --- /dev/null +++ b/ext/hwloc/include/hwloc/bitmap.h @@ -0,0 +1,350 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2011 inria. All rights reserved. + * Copyright © 2009-2012 Université Bordeaux 1 + * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +/** \file + * \brief The bitmap API, for use in hwloc itself. + */ + +#ifndef HWLOC_BITMAP_H +#define HWLOC_BITMAP_H + +#include +#include + + +#ifdef __cplusplus +extern "C" { +#endif + + +/** \defgroup hwlocality_bitmap The bitmap API + * + * The ::hwloc_bitmap_t type represents a set of objects, typically OS + * processors -- which may actually be hardware threads (represented + * by ::hwloc_cpuset_t, which is a typedef for ::hwloc_bitmap_t) -- or + * memory nodes (represented by ::hwloc_nodeset_t, which is also a + * typedef for ::hwloc_bitmap_t). + * + * Both CPU and node sets are always indexed by OS physical number. + * + * \note CPU sets and nodesets are described in \ref hwlocality_object_sets. + * + * A bitmap may be of infinite size. + * @{ + */ + + +/** \brief + * Set of bits represented as an opaque pointer to an internal bitmap. + */ +typedef struct hwloc_bitmap_s * hwloc_bitmap_t; +/** \brief a non-modifiable ::hwloc_bitmap_t */ +typedef const struct hwloc_bitmap_s * hwloc_const_bitmap_t; + + +/* + * Bitmap allocation, freeing and copying. + */ + +/** \brief Allocate a new empty bitmap. + * + * \returns A valid bitmap or \c NULL. + * + * The bitmap should be freed by a corresponding call to + * hwloc_bitmap_free(). + */ +HWLOC_DECLSPEC hwloc_bitmap_t hwloc_bitmap_alloc(void) __hwloc_attribute_malloc; + +/** \brief Allocate a new full bitmap. */ +HWLOC_DECLSPEC hwloc_bitmap_t hwloc_bitmap_alloc_full(void) __hwloc_attribute_malloc; + +/** \brief Free bitmap \p bitmap. + * + * If \p bitmap is \c NULL, no operation is performed. + */ +HWLOC_DECLSPEC void hwloc_bitmap_free(hwloc_bitmap_t bitmap); + +/** \brief Duplicate bitmap \p bitmap by allocating a new bitmap and copying \p bitmap contents. + * + * If \p bitmap is \c NULL, \c NULL is returned. + */ +HWLOC_DECLSPEC hwloc_bitmap_t hwloc_bitmap_dup(hwloc_const_bitmap_t bitmap) __hwloc_attribute_malloc; + +/** \brief Copy the contents of bitmap \p src into the already allocated bitmap \p dst */ +HWLOC_DECLSPEC void hwloc_bitmap_copy(hwloc_bitmap_t dst, hwloc_const_bitmap_t src); + + +/* + * Bitmap/String Conversion + */ + +/** \brief Stringify a bitmap. + * + * Up to \p buflen characters may be written in buffer \p buf. + * + * If \p buflen is 0, \p buf may safely be \c NULL. + * + * \return the number of character that were actually written if not truncating, + * or that would have been written (not including the ending \\0). + */ +HWLOC_DECLSPEC int hwloc_bitmap_snprintf(char * __hwloc_restrict buf, size_t buflen, hwloc_const_bitmap_t bitmap); + +/** \brief Stringify a bitmap into a newly allocated string. + */ +HWLOC_DECLSPEC int hwloc_bitmap_asprintf(char ** strp, hwloc_const_bitmap_t bitmap); + +/** \brief Parse a bitmap string and stores it in bitmap \p bitmap. + */ +HWLOC_DECLSPEC int hwloc_bitmap_sscanf(hwloc_bitmap_t bitmap, const char * __hwloc_restrict string); + +/** \brief Stringify a bitmap in the list format. + * + * Lists are comma-separated indexes or ranges. + * Ranges are dash separated indexes. + * The last range may not have a ending indexes if the bitmap is infinite. + * + * Up to \p buflen characters may be written in buffer \p buf. + * + * If \p buflen is 0, \p buf may safely be \c NULL. + * + * \return the number of character that were actually written if not truncating, + * or that would have been written (not including the ending \\0). + */ +HWLOC_DECLSPEC int hwloc_bitmap_list_snprintf(char * __hwloc_restrict buf, size_t buflen, hwloc_const_bitmap_t bitmap); + +/** \brief Stringify a bitmap into a newly allocated list string. + */ +HWLOC_DECLSPEC int hwloc_bitmap_list_asprintf(char ** strp, hwloc_const_bitmap_t bitmap); + +/** \brief Parse a list string and stores it in bitmap \p bitmap. + */ +HWLOC_DECLSPEC int hwloc_bitmap_list_sscanf(hwloc_bitmap_t bitmap, const char * __hwloc_restrict string); + +/** \brief Stringify a bitmap in the taskset-specific format. + * + * The taskset command manipulates bitmap strings that contain a single + * (possible very long) hexadecimal number starting with 0x. + * + * Up to \p buflen characters may be written in buffer \p buf. + * + * If \p buflen is 0, \p buf may safely be \c NULL. + * + * \return the number of character that were actually written if not truncating, + * or that would have been written (not including the ending \\0). + */ +HWLOC_DECLSPEC int hwloc_bitmap_taskset_snprintf(char * __hwloc_restrict buf, size_t buflen, hwloc_const_bitmap_t bitmap); + +/** \brief Stringify a bitmap into a newly allocated taskset-specific string. + */ +HWLOC_DECLSPEC int hwloc_bitmap_taskset_asprintf(char ** strp, hwloc_const_bitmap_t bitmap); + +/** \brief Parse a taskset-specific bitmap string and stores it in bitmap \p bitmap. + */ +HWLOC_DECLSPEC int hwloc_bitmap_taskset_sscanf(hwloc_bitmap_t bitmap, const char * __hwloc_restrict string); + + +/* + * Building bitmaps. + */ + +/** \brief Empty the bitmap \p bitmap */ +HWLOC_DECLSPEC void hwloc_bitmap_zero(hwloc_bitmap_t bitmap); + +/** \brief Fill bitmap \p bitmap with all possible indexes (even if those objects don't exist or are otherwise unavailable) */ +HWLOC_DECLSPEC void hwloc_bitmap_fill(hwloc_bitmap_t bitmap); + +/** \brief Empty the bitmap \p bitmap and add bit \p id */ +HWLOC_DECLSPEC void hwloc_bitmap_only(hwloc_bitmap_t bitmap, unsigned id); + +/** \brief Fill the bitmap \p and clear the index \p id */ +HWLOC_DECLSPEC void hwloc_bitmap_allbut(hwloc_bitmap_t bitmap, unsigned id); + +/** \brief Setup bitmap \p bitmap from unsigned long \p mask */ +HWLOC_DECLSPEC void hwloc_bitmap_from_ulong(hwloc_bitmap_t bitmap, unsigned long mask); + +/** \brief Setup bitmap \p bitmap from unsigned long \p mask used as \p i -th subset */ +HWLOC_DECLSPEC void hwloc_bitmap_from_ith_ulong(hwloc_bitmap_t bitmap, unsigned i, unsigned long mask); + + +/* + * Modifying bitmaps. + */ + +/** \brief Add index \p id in bitmap \p bitmap */ +HWLOC_DECLSPEC void hwloc_bitmap_set(hwloc_bitmap_t bitmap, unsigned id); + +/** \brief Add indexes from \p begin to \p end in bitmap \p bitmap. + * + * If \p end is \c -1, the range is infinite. + */ +HWLOC_DECLSPEC void hwloc_bitmap_set_range(hwloc_bitmap_t bitmap, unsigned begin, int end); + +/** \brief Replace \p i -th subset of bitmap \p bitmap with unsigned long \p mask */ +HWLOC_DECLSPEC void hwloc_bitmap_set_ith_ulong(hwloc_bitmap_t bitmap, unsigned i, unsigned long mask); + +/** \brief Remove index \p id from bitmap \p bitmap */ +HWLOC_DECLSPEC void hwloc_bitmap_clr(hwloc_bitmap_t bitmap, unsigned id); + +/** \brief Remove indexes from \p begin to \p end in bitmap \p bitmap. + * + * If \p end is \c -1, the range is infinite. + */ +HWLOC_DECLSPEC void hwloc_bitmap_clr_range(hwloc_bitmap_t bitmap, unsigned begin, int end); + +/** \brief Keep a single index among those set in bitmap \p bitmap + * + * May be useful before binding so that the process does not + * have a chance of migrating between multiple logical CPUs + * in the original mask. + */ +HWLOC_DECLSPEC void hwloc_bitmap_singlify(hwloc_bitmap_t bitmap); + + +/* + * Consulting bitmaps. + */ + +/** \brief Convert the beginning part of bitmap \p bitmap into unsigned long \p mask */ +HWLOC_DECLSPEC unsigned long hwloc_bitmap_to_ulong(hwloc_const_bitmap_t bitmap) __hwloc_attribute_pure; + +/** \brief Convert the \p i -th subset of bitmap \p bitmap into unsigned long mask */ +HWLOC_DECLSPEC unsigned long hwloc_bitmap_to_ith_ulong(hwloc_const_bitmap_t bitmap, unsigned i) __hwloc_attribute_pure; + +/** \brief Test whether index \p id is part of bitmap \p bitmap */ +HWLOC_DECLSPEC int hwloc_bitmap_isset(hwloc_const_bitmap_t bitmap, unsigned id) __hwloc_attribute_pure; + +/** \brief Test whether bitmap \p bitmap is empty */ +HWLOC_DECLSPEC int hwloc_bitmap_iszero(hwloc_const_bitmap_t bitmap) __hwloc_attribute_pure; + +/** \brief Test whether bitmap \p bitmap is completely full */ +HWLOC_DECLSPEC int hwloc_bitmap_isfull(hwloc_const_bitmap_t bitmap) __hwloc_attribute_pure; + +/** \brief Compute the first index (least significant bit) in bitmap \p bitmap + * + * \return -1 if no index is set. + */ +HWLOC_DECLSPEC int hwloc_bitmap_first(hwloc_const_bitmap_t bitmap) __hwloc_attribute_pure; + +/** \brief Compute the next index in bitmap \p bitmap which is after index \p prev + * + * If \p prev is -1, the first index is returned. + * + * \return -1 if no index with higher index is bitmap. + */ +HWLOC_DECLSPEC int hwloc_bitmap_next(hwloc_const_bitmap_t bitmap, int prev) __hwloc_attribute_pure; + +/** \brief Compute the last index (most significant bit) in bitmap \p bitmap + * + * \return -1 if no index is bitmap, or if the index bitmap is infinite. + */ +HWLOC_DECLSPEC int hwloc_bitmap_last(hwloc_const_bitmap_t bitmap) __hwloc_attribute_pure; + +/** \brief Compute the "weight" of bitmap \p bitmap (i.e., number of + * indexes that are in the bitmap). + * + * \return the number of indexes that are in the bitmap. + */ +HWLOC_DECLSPEC int hwloc_bitmap_weight(hwloc_const_bitmap_t bitmap) __hwloc_attribute_pure; + +/** \brief Loop macro iterating on bitmap \p bitmap + * \hideinitializer + * + * \p index is the loop variable; it should be an unsigned int. The + * first iteration will set \p index to the lowest index in the bitmap. + * Successive iterations will iterate through, in order, all remaining + * indexes that in the bitmap. To be specific: each iteration will return a + * value for \p index such that hwloc_bitmap_isset(bitmap, index) is true. + * + * The assert prevents the loop from being infinite if the bitmap is infinite. + */ +#define hwloc_bitmap_foreach_begin(id, bitmap) \ +do { \ + assert(hwloc_bitmap_weight(bitmap) != -1); \ + for (id = hwloc_bitmap_first(bitmap); \ + (unsigned) id != (unsigned) -1; \ + id = hwloc_bitmap_next(bitmap, id)) { \ +/** \brief End of loop. Needs a terminating ';'. + * \hideinitializer + * + * \sa hwloc_bitmap_foreach_begin */ +#define hwloc_bitmap_foreach_end() \ + } \ +} while (0) + + +/* + * Combining bitmaps. + */ + +/** \brief Or bitmaps \p bitmap1 and \p bitmap2 and store the result in bitmap \p res + * + * \p res can be the same as \p bitmap1 or \p bitmap2 + */ +HWLOC_DECLSPEC void hwloc_bitmap_or (hwloc_bitmap_t res, hwloc_const_bitmap_t bitmap1, hwloc_const_bitmap_t bitmap2); + +/** \brief And bitmaps \p bitmap1 and \p bitmap2 and store the result in bitmap \p res + * + * \p res can be the same as \p bitmap1 or \p bitmap2 + */ +HWLOC_DECLSPEC void hwloc_bitmap_and (hwloc_bitmap_t res, hwloc_const_bitmap_t bitmap1, hwloc_const_bitmap_t bitmap2); + +/** \brief And bitmap \p bitmap1 and the negation of \p bitmap2 and store the result in bitmap \p res + * + * \p res can be the same as \p bitmap1 or \p bitmap2 + */ +HWLOC_DECLSPEC void hwloc_bitmap_andnot (hwloc_bitmap_t res, hwloc_const_bitmap_t bitmap1, hwloc_const_bitmap_t bitmap2); + +/** \brief Xor bitmaps \p bitmap1 and \p bitmap2 and store the result in bitmap \p res + * + * \p res can be the same as \p bitmap1 or \p bitmap2 + */ +HWLOC_DECLSPEC void hwloc_bitmap_xor (hwloc_bitmap_t res, hwloc_const_bitmap_t bitmap1, hwloc_const_bitmap_t bitmap2); + +/** \brief Negate bitmap \p bitmap and store the result in bitmap \p res + * + * \p res can be the same as \p bitmap + */ +HWLOC_DECLSPEC void hwloc_bitmap_not (hwloc_bitmap_t res, hwloc_const_bitmap_t bitmap); + + +/* + * Comparing bitmaps. + */ + +/** \brief Test whether bitmaps \p bitmap1 and \p bitmap2 intersects */ +HWLOC_DECLSPEC int hwloc_bitmap_intersects (hwloc_const_bitmap_t bitmap1, hwloc_const_bitmap_t bitmap2) __hwloc_attribute_pure; + +/** \brief Test whether bitmap \p sub_bitmap is part of bitmap \p super_bitmap */ +HWLOC_DECLSPEC int hwloc_bitmap_isincluded (hwloc_const_bitmap_t sub_bitmap, hwloc_const_bitmap_t super_bitmap) __hwloc_attribute_pure; + +/** \brief Test whether bitmap \p bitmap1 is equal to bitmap \p bitmap2 */ +HWLOC_DECLSPEC int hwloc_bitmap_isequal (hwloc_const_bitmap_t bitmap1, hwloc_const_bitmap_t bitmap2) __hwloc_attribute_pure; + +/** \brief Compare bitmaps \p bitmap1 and \p bitmap2 using their lowest index. + * + * Smaller least significant bit is smaller. + * The empty bitmap is considered higher than anything. + */ +HWLOC_DECLSPEC int hwloc_bitmap_compare_first(hwloc_const_bitmap_t bitmap1, hwloc_const_bitmap_t bitmap2) __hwloc_attribute_pure; + +/** \brief Compare bitmaps \p bitmap1 and \p bitmap2 using their highest index. + * + * Higher most significant bit is higher. + * The empty bitmap is considered lower than anything. + */ +HWLOC_DECLSPEC int hwloc_bitmap_compare(hwloc_const_bitmap_t bitmap1, hwloc_const_bitmap_t bitmap2) __hwloc_attribute_pure; + +/** @} */ + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +#endif /* HWLOC_BITMAP_H */ diff --git a/ext/hwloc/include/hwloc/cuda.h b/ext/hwloc/include/hwloc/cuda.h new file mode 100644 index 000000000..25201689e --- /dev/null +++ b/ext/hwloc/include/hwloc/cuda.h @@ -0,0 +1,224 @@ +/* + * Copyright © 2010-2013 Inria. All rights reserved. + * Copyright © 2010-2011 Université Bordeaux 1 + * Copyright © 2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +/** \file + * \brief Macros to help interaction between hwloc and the CUDA Driver API. + * + * Applications that use both hwloc and the CUDA Driver API may want to + * include this file so as to get topology information for CUDA devices. + * + */ + +#ifndef HWLOC_CUDA_H +#define HWLOC_CUDA_H + +#include +#include +#include +#ifdef HWLOC_LINUX_SYS +#include +#endif + +#include + + +#ifdef __cplusplus +extern "C" { +#endif + + +/** \defgroup hwlocality_cuda Interoperability with the CUDA Driver API + * + * This interface offers ways to retrieve topology information about + * CUDA devices when using the CUDA Driver API. + * + * @{ + */ + +/** \brief Return the domain, bus and device IDs of the CUDA device \p cudevice. + * + * Device \p cudevice must match the local machine. + */ +static __hwloc_inline int +hwloc_cuda_get_device_pci_ids(hwloc_topology_t topology __hwloc_attribute_unused, + CUdevice cudevice, int *domain, int *bus, int *dev) +{ + CUresult cres; + +#ifdef CU_DEVICE_ATTRIBUTE_PCI_DOMAIN_ID + cres = cuDeviceGetAttribute(domain, CU_DEVICE_ATTRIBUTE_PCI_DOMAIN_ID, cudevice); + if (cres != CUDA_SUCCESS) { + errno = ENOSYS; + return -1; + } +#else + *domain = 0; +#endif + cres = cuDeviceGetAttribute(bus, CU_DEVICE_ATTRIBUTE_PCI_BUS_ID, cudevice); + if (cres != CUDA_SUCCESS) { + errno = ENOSYS; + return -1; + } + cres = cuDeviceGetAttribute(dev, CU_DEVICE_ATTRIBUTE_PCI_DEVICE_ID, cudevice); + if (cres != CUDA_SUCCESS) { + errno = ENOSYS; + return -1; + } + + return 0; +} + +/** \brief Get the CPU set of logical processors that are physically + * close to device \p cudevice. + * + * Return the CPU set describing the locality of the CUDA device \p cudevice. + * + * Topology \p topology and device \p cudevice must match the local machine. + * I/O devices detection and the CUDA component are not needed in the topology. + * + * The function only returns the locality of the device. + * If more information about the device is needed, OS objects should + * be used instead, see hwloc_cuda_get_device_osdev() + * and hwloc_cuda_get_device_osdev_by_index(). + * + * This function is currently only implemented in a meaningful way for + * Linux; other systems will simply get a full cpuset. + */ +static __hwloc_inline int +hwloc_cuda_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused, + CUdevice cudevice, hwloc_cpuset_t set) +{ +#ifdef HWLOC_LINUX_SYS + /* If we're on Linux, use the sysfs mechanism to get the local cpus */ +#define HWLOC_CUDA_DEVICE_SYSFS_PATH_MAX 128 + char path[HWLOC_CUDA_DEVICE_SYSFS_PATH_MAX]; + FILE *sysfile = NULL; + int domainid, busid, deviceid; + + if (hwloc_cuda_get_device_pci_ids(topology, cudevice, &domainid, &busid, &deviceid)) + return -1; + + if (!hwloc_topology_is_thissystem(topology)) { + errno = EINVAL; + return -1; + } + + sprintf(path, "/sys/bus/pci/devices/%04x:%02x:%02x.0/local_cpus", domainid, busid, deviceid); + sysfile = fopen(path, "r"); + if (!sysfile) + return -1; + + hwloc_linux_parse_cpumap_file(sysfile, set); + if (hwloc_bitmap_iszero(set)) + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); + + fclose(sysfile); +#else + /* Non-Linux systems simply get a full cpuset */ + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); +#endif + return 0; +} + +/** \brief Get the hwloc PCI device object corresponding to the + * CUDA device \p cudevice. + * + * Return the PCI device object describing the CUDA device \p cudevice. + * Return NULL if there is none. + * + * Topology \p topology and device \p cudevice must match the local machine. + * I/O devices detection must be enabled in topology \p topology. + * The CUDA component is not needed in the topology. + */ +static __hwloc_inline hwloc_obj_t +hwloc_cuda_get_device_pcidev(hwloc_topology_t topology, CUdevice cudevice) +{ + int domain, bus, dev; + + if (hwloc_cuda_get_device_pci_ids(topology, cudevice, &domain, &bus, &dev)) + return NULL; + + return hwloc_get_pcidev_by_busid(topology, domain, bus, dev, 0); +} + +/** \brief Get the hwloc OS device object corresponding to CUDA device \p cudevice. + * + * Return the hwloc OS device object that describes the given + * CUDA device \p cudevice. Return NULL if there is none. + * + * Topology \p topology and device \p cudevice must match the local machine. + * I/O devices detection and the NVML component must be enabled in the topology. + * If not, the locality of the object may still be found using + * hwloc_cuda_get_device_cpuset(). + * + * \note The corresponding hwloc PCI device may be found by looking + * at the result parent pointer. + */ +static __hwloc_inline hwloc_obj_t +hwloc_cuda_get_device_osdev(hwloc_topology_t topology, CUdevice cudevice) +{ + hwloc_obj_t osdev = NULL; + int domain, bus, dev; + + if (hwloc_cuda_get_device_pci_ids(topology, cudevice, &domain, &bus, &dev)) + return NULL; + + osdev = NULL; + while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { + hwloc_obj_t pcidev = osdev->parent; + if (strncmp(osdev->name, "cuda", 4)) + continue; + if (pcidev + && pcidev->type == HWLOC_OBJ_PCI_DEVICE + && (int) pcidev->attr->pcidev.domain == domain + && (int) pcidev->attr->pcidev.bus == bus + && (int) pcidev->attr->pcidev.dev == dev + && pcidev->attr->pcidev.func == 0) + return osdev; + } + + return NULL; +} + +/** \brief Get the hwloc OS device object corresponding to the + * CUDA device whose index is \p idx. + * + * Return the OS device object describing the CUDA device whose + * index is \p idx. Return NULL if there is none. + * + * The topology \p topology does not necessarily have to match the current + * machine. For instance the topology may be an XML import of a remote host. + * I/O devices detection and the CUDA component must be enabled in the topology. + * + * \note The corresponding PCI device object can be obtained by looking + * at the OS device parent object. + * + * \note This function is identical to hwloc_cudart_get_device_osdev_by_index(). + */ +static __hwloc_inline hwloc_obj_t +hwloc_cuda_get_device_osdev_by_index(hwloc_topology_t topology, unsigned idx) +{ + hwloc_obj_t osdev = NULL; + while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { + if (HWLOC_OBJ_OSDEV_COPROC == osdev->attr->osdev.type + && osdev->name + && !strncmp("cuda", osdev->name, 4) + && atoi(osdev->name + 4) == (int) idx) + return osdev; + } + return NULL; +} + +/** @} */ + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +#endif /* HWLOC_CUDA_H */ diff --git a/ext/hwloc/include/hwloc/cudart.h b/ext/hwloc/include/hwloc/cudart.h new file mode 100644 index 000000000..606d2d075 --- /dev/null +++ b/ext/hwloc/include/hwloc/cudart.h @@ -0,0 +1,183 @@ +/* + * Copyright © 2010-2013 Inria. All rights reserved. + * Copyright © 2010-2011 Université Bordeaux 1 + * Copyright © 2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +/** \file + * \brief Macros to help interaction between hwloc and the CUDA Runtime API. + * + * Applications that use both hwloc and the CUDA Runtime API may want to + * include this file so as to get topology information for CUDA devices. + * + */ + +#ifndef HWLOC_CUDART_H +#define HWLOC_CUDART_H + +#include +#include +#include +#ifdef HWLOC_LINUX_SYS +#include +#endif + +#include + + +#ifdef __cplusplus +extern "C" { +#endif + + +/** \defgroup hwlocality_cudart Interoperability with the CUDA Runtime API + * + * This interface offers ways to retrieve topology information about + * CUDA devices when using the CUDA Runtime API. + * + * @{ + */ + +/** \brief Return the domain, bus and device IDs of the CUDA device whose index is \p idx. + * + * Device index \p idx must match the local machine. + */ +static __hwloc_inline int +hwloc_cudart_get_device_pci_ids(hwloc_topology_t topology __hwloc_attribute_unused, + int idx, int *domain, int *bus, int *dev) +{ + cudaError_t cerr; + struct cudaDeviceProp prop; + + cerr = cudaGetDeviceProperties(&prop, idx); + if (cerr) { + errno = ENOSYS; + return -1; + } + +#ifdef CU_DEVICE_ATTRIBUTE_PCI_DOMAIN_ID + *domain = prop.pciDomainID; +#else + *domain = 0; +#endif + + *bus = prop.pciBusID; + *dev = prop.pciDeviceID; + + return 0; +} + +/** \brief Get the CPU set of logical processors that are physically + * close to device \p idx. + * + * Return the CPU set describing the locality of the CUDA device + * whose index is \p idx. + * + * Topology \p topology and device \p idx must match the local machine. + * I/O devices detection and the CUDA component are not needed in the topology. + * + * The function only returns the locality of the device. + * If more information about the device is needed, OS objects should + * be used instead, see hwloc_cudart_get_device_osdev_by_index(). + * + * This function is currently only implemented in a meaningful way for + * Linux; other systems will simply get a full cpuset. + */ +static __hwloc_inline int +hwloc_cudart_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused, + int idx, hwloc_cpuset_t set) +{ +#ifdef HWLOC_LINUX_SYS + /* If we're on Linux, use the sysfs mechanism to get the local cpus */ +#define HWLOC_CUDART_DEVICE_SYSFS_PATH_MAX 128 + char path[HWLOC_CUDART_DEVICE_SYSFS_PATH_MAX]; + FILE *sysfile = NULL; + int domain, bus, dev; + + if (hwloc_cudart_get_device_pci_ids(topology, idx, &domain, &bus, &dev)) + return -1; + + if (!hwloc_topology_is_thissystem(topology)) { + errno = EINVAL; + return -1; + } + + sprintf(path, "/sys/bus/pci/devices/%04x:%02x:%02x.0/local_cpus", domain, bus, dev); + sysfile = fopen(path, "r"); + if (!sysfile) + return -1; + + hwloc_linux_parse_cpumap_file(sysfile, set); + if (hwloc_bitmap_iszero(set)) + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); + + fclose(sysfile); +#else + /* Non-Linux systems simply get a full cpuset */ + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); +#endif + return 0; +} + +/** \brief Get the hwloc PCI device object corresponding to the + * CUDA device whose index is \p idx. + * + * Return the PCI device object describing the CUDA device whose + * index is \p idx. Return NULL if there is none. + * + * Topology \p topology and device \p idx must match the local machine. + * I/O devices detection must be enabled in topology \p topology. + * The CUDA component is not needed in the topology. + */ +static __hwloc_inline hwloc_obj_t +hwloc_cudart_get_device_pcidev(hwloc_topology_t topology, int idx) +{ + int domain, bus, dev; + + if (hwloc_cudart_get_device_pci_ids(topology, idx, &domain, &bus, &dev)) + return NULL; + + return hwloc_get_pcidev_by_busid(topology, domain, bus, dev, 0); +} + +/** \brief Get the hwloc OS device object corresponding to the + * CUDA device whose index is \p idx. + * + * Return the OS device object describing the CUDA device whose + * index is \p idx. Return NULL if there is none. + * + * The topology \p topology does not necessarily have to match the current + * machine. For instance the topology may be an XML import of a remote host. + * I/O devices detection and the CUDA component must be enabled in the topology. + * If not, the locality of the object may still be found using + * hwloc_cudart_get_device_cpuset(). + * + * \note The corresponding PCI device object can be obtained by looking + * at the OS device parent object. + * + * \note This function is identical to hwloc_cuda_get_device_osdev_by_index(). + */ +static __hwloc_inline hwloc_obj_t +hwloc_cudart_get_device_osdev_by_index(hwloc_topology_t topology, unsigned idx) +{ + hwloc_obj_t osdev = NULL; + while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { + if (HWLOC_OBJ_OSDEV_COPROC == osdev->attr->osdev.type + && osdev->name + && !strncmp("cuda", osdev->name, 4) + && atoi(osdev->name + 4) == (int) idx) + return osdev; + } + return NULL; +} + +/** @} */ + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +#endif /* HWLOC_CUDART_H */ diff --git a/ext/hwloc/include/hwloc/deprecated.h b/ext/hwloc/include/hwloc/deprecated.h new file mode 100644 index 000000000..544ca8f0a --- /dev/null +++ b/ext/hwloc/include/hwloc/deprecated.h @@ -0,0 +1,54 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2013 Inria. All rights reserved. + * Copyright © 2009-2012 Université Bordeaux 1 + * Copyright © 2009-2010 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +/** + * This file contains the inline code of functions declared in hwloc.h + */ + +#ifndef HWLOC_DEPRECATED_H +#define HWLOC_DEPRECATED_H + +#ifndef HWLOC_H +#error Please include the main hwloc.h instead +#endif + +#ifdef __cplusplus +extern "C" { +#endif + +/** \brief Stringify a given topology object into a human-readable form. + * + * \note This function is deprecated in favor of hwloc_obj_type_snprintf() + * and hwloc_obj_attr_snprintf() since it is not very flexible and + * only prints physical/OS indexes. + * + * Fill string \p string up to \p size characters with the description + * of topology object \p obj in topology \p topology. + * + * If \p verbose is set, a longer description is used. Otherwise a + * short description is used. + * + * \p indexprefix is used to prefix the \p os_index attribute number of + * the object in the description. If \c NULL, the \c # character is used. + * + * If \p size is 0, \p string may safely be \c NULL. + * + * \return the number of character that were actually written if not truncating, + * or that would have been written (not including the ending \\0). + */ +HWLOC_DECLSPEC int hwloc_obj_snprintf(char * __hwloc_restrict string, size_t size, + hwloc_topology_t topology, hwloc_obj_t obj, + const char * __hwloc_restrict indexprefix, int verbose); + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +#endif /* HWLOC_INLINES_H */ diff --git a/ext/hwloc/include/hwloc/diff.h b/ext/hwloc/include/hwloc/diff.h new file mode 100644 index 000000000..59f729657 --- /dev/null +++ b/ext/hwloc/include/hwloc/diff.h @@ -0,0 +1,292 @@ +/* + * Copyright © 2013 Inria. All rights reserved. + * See COPYING in top-level directory. + */ + +/** \file + * \brief Topology differences. + */ + +#ifndef HWLOC_DIFF_H +#define HWLOC_DIFF_H + +#ifndef HWLOC_H +#error Please include the main hwloc.h instead +#endif + + +#ifdef __cplusplus +extern "C" { +#elif 0 +} +#endif + + +/** \defgroup hwlocality_diff Topology differences + * + * Applications that manipulate many similar topologies, for instance + * one for each node of a homogeneous cluster, may want to compress + * topologies to reduce the memory footprint. + * + * This file offers a way to manipulate the difference between topologies + * and export/import it to/from XML. + * Compression may therefore be achieved by storing one topology + * entirely while the others are only described by their differences + * with the former. + * The actual topology can be reconstructed when actually needed by + * applying the precomputed difference to the reference topology. + * + * This interface targets very similar nodes. + * Only very simple differences between topologies are actually + * supported, for instance a change in the memory size, the name + * of the object, or some info attribute. + * More complex differences such as adding or removing objects cannot + * be represented in the difference structures and therefore return + * errors. + * + * @{ + */ + + +/** \brief Type of one object attribute difference. + */ +typedef enum hwloc_topology_diff_obj_attr_type_e { + /** \brief The object local memory is modified. + * The union is a hwloc_topology_diff_obj_attr_uint64_s + * (and the index field is ignored). + */ + HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_SIZE, + + /** \brief The object name is modified. + * The union is a hwloc_topology_diff_obj_attr_string_s + * (and the name field is ignored). + */ + + HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_NAME, + /** \brief the value of an info attribute is modified. + * The union is a hwloc_topology_diff_obj_attr_string_s. + */ + HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_INFO +} hwloc_topology_diff_obj_attr_type_t; + +/** \brief One object attribute difference. + */ +union hwloc_topology_diff_obj_attr_u { + struct hwloc_topology_diff_obj_attr_generic_s { + /* each part of the union must start with these */ + hwloc_topology_diff_obj_attr_type_t type; + } generic; + + /** \brief Integer attribute modification with an optional index. */ + struct hwloc_topology_diff_obj_attr_uint64_s { + /* used for storing integer attributes */ + hwloc_topology_diff_obj_attr_type_t type; + hwloc_uint64_t index; /* not used for SIZE */ + hwloc_uint64_t oldvalue; + hwloc_uint64_t newvalue; + } uint64; + + /** \brief String attribute modification with an optional name */ + struct hwloc_topology_diff_obj_attr_string_s { + /* used for storing name and info pairs */ + hwloc_topology_diff_obj_attr_type_t type; + char *name; /* not used for NAME */ + char *oldvalue; + char *newvalue; + } string; +}; + + +/** \brief Type of one element of a difference list. + */ +typedef enum hwloc_topology_diff_type_e { + /*< \brief An object attribute was changed. + * The union is a hwloc_topology_diff_obj_attr_s. + */ + HWLOC_TOPOLOGY_DIFF_OBJ_ATTR, + + /*< \brief The difference is too complex, + * it cannot be represented. The difference below + * this object has not been checked. + * hwloc_topology_diff_build() will return 1. + * + * The union is a hwloc_topology_diff_too_complex_s. + */ + HWLOC_TOPOLOGY_DIFF_TOO_COMPLEX +} hwloc_topology_diff_type_t; + +/** \brief One element of a difference list between two topologies. + */ +typedef union hwloc_topology_diff_u { + struct hwloc_topology_diff_generic_s { + /* each part of the union must start with these */ + hwloc_topology_diff_type_t type; + union hwloc_topology_diff_u * next; + } generic; + + /* A difference in an object attribute. */ + struct hwloc_topology_diff_obj_attr_s { + hwloc_topology_diff_type_t type; /* must be HWLOC_TOPOLOGY_DIFF_OBJ_ATTR */ + union hwloc_topology_diff_u * next; + /* List of attribute differences for a single object */ + unsigned obj_depth; + unsigned obj_index; + union hwloc_topology_diff_obj_attr_u diff; + } obj_attr; + + /* A difference that is too complex. */ + struct hwloc_topology_diff_too_complex_s { + hwloc_topology_diff_type_t type; /* must be HWLOC_TOPOLOGY_DIFF_TOO_COMPLEX */ + union hwloc_topology_diff_u * next; + /* Where we had to stop computing the diff in the first topology */ + unsigned obj_depth; + unsigned obj_index; + } too_complex; +} * hwloc_topology_diff_t; + + +/** \brief Compute the difference between 2 topologies. + * + * The difference is stored as a list of hwloc_topology_diff_t entries + * starting at \p diff. + * It is computed by doing a depth-first traversal of both topology trees + * simultaneously. + * + * If the difference between 2 objects is too complex to be represented + * (for instance if some objects are added or removed), a special diff + * entry of type HWLOC_TOPOLOGY_DIFF_TOO_COMPLEX is queued. + * The computation of the diff does not continue under these objects. + * So each such diff entry means that the difference between two subtrees + * could not be computed. + * + * \return 0 if the difference can be represented properly. + * + * \return 0 with \p diff pointing NULL if there is no difference between + * the topologies. + * + * \return 1 if the difference is too complex (for instance if some objects are added + * or removed), some entries in the list will be of type HWLOC_TOPOLOGY_DIFF_TOO_COMPLEX + * and 1 is returned. + * + * \return -1 on any other error. + * + * \note \p flags is currently not used. It should be 0. + * + * \note The output diff has to be freed with hwloc_topology_diff_destroy(). + * + * \note The output diff can only be exported to XML or passed to + * hwloc_topology_diff_apply() if 0 was returned, i.e. if no entry of type + * HWLOC_TOPOLOGY_DIFF_TOO_COMPLEX is listed. + * + * \note The output diff may be modified by removing some entries from + * the list. The removed entries should be freed by passing them as a list + * to hwloc_topology_diff_destroy(). +*/ +HWLOC_DECLSPEC int hwloc_topology_diff_build(hwloc_topology_t topology, hwloc_topology_t newtopology, unsigned long flags, hwloc_topology_diff_t *diff); + +/** \brief Flags to be given to hwloc_topology_diff_apply(). + */ +enum hwloc_topology_diff_apply_flags_e { + /** \brief Apply topology diff in reverse direction. + * \hideinitializer + */ + HWLOC_TOPOLOGY_DIFF_APPLY_REVERSE = (1UL<<0) +}; + +/** \brief Apply a topology diff to an existing topology. + * + * \p flags is an OR'ed set of hwloc_topology_diff_apply_flags_e. + * + * The new topology is modified in place. hwloc_topology_dup() + * may be used to duplicate before patching. + * + * If the difference cannot be applied entirely, all previous applied + * portions are unapplied before returning. + * + * \return 0 on success. + * + * \return -N if applying the difference failed while trying + * to apply the N-th part of the difference. For instance -1 + * is returned if the very first difference portion could not + * be applied. + */ +HWLOC_DECLSPEC int hwloc_topology_diff_apply(hwloc_topology_t topology, hwloc_topology_diff_t diff, unsigned long flags); + +/** \brief Destroy a list of topology differences. + * + * \note The \p topology parameter must be a valid topology + * but it is not required that it is related to \p diff. + */ +HWLOC_DECLSPEC int hwloc_topology_diff_destroy(hwloc_topology_t topology, hwloc_topology_diff_t diff); + +/** \brief Load a list of topology differences from a XML file. + * + * If not \c NULL, \p refname will be filled with the identifier + * string of the reference topology for the difference file, + * if any was specified in the XML file. + * This identifier is usually the name of the other XML file + * that contains the reference topology. + * + * \note The \p topology parameter must be a valid topology + * but it is not required that it is related to \p diff. + * + * \note the pointer returned in refname should later be freed + * by the caller. + */ +HWLOC_DECLSPEC int hwloc_topology_diff_load_xml(hwloc_topology_t topology, const char *xmlpath, hwloc_topology_diff_t *diff, char **refname); + +/** \brief Export a list of topology differences to a XML file. + * + * If not \c NULL, \p refname defines an identifier string + * for the reference topology which was used as a base when + * computing this difference. + * This identifier is usually the name of the other XML file + * that contains the reference topology. + * This attribute is given back when reading the diff from XML. + * + * \note The \p topology parameter must be a valid topology + * but it is not required that it is related to \p diff. + */ +HWLOC_DECLSPEC int hwloc_topology_diff_export_xml(hwloc_topology_t topology, hwloc_topology_diff_t diff, const char *refname, const char *xmlpath); + +/** \brief Load a list of topology differences from a XML buffer. + * + * If not \c NULL, \p refname will be filled with the identifier + * string of the reference topology for the difference file, + * if any was specified in the XML file. + * This identifier is usually the name of the other XML file + * that contains the reference topology. + * + * \note The \p topology parameter must be a valid topology + * but it is not required that it is related to \p diff. + * + * \note the pointer returned in refname should later be freed + * by the caller. + */ +HWLOC_DECLSPEC int hwloc_topology_diff_load_xmlbuffer(hwloc_topology_t topology, const char *xmlbuffer, int buflen, hwloc_topology_diff_t *diff, char **refname); + +/** \brief Export a list of topology differences to a XML buffer. + * + * If not \c NULL, \p refname defines an identifier string + * for the reference topology which was used as a base when + * computing this difference. + * This identifier is usually the name of the other XML file + * that contains the reference topology. + * This attribute is given back when reading the diff from XML. + * + * \note The XML buffer should later be freed with hwloc_free_xmlbuffer(). + * + * \note The \p topology parameter must be a valid topology + * but it is not required that it is related to \p diff. + */ +HWLOC_DECLSPEC int hwloc_topology_diff_export_xmlbuffer(hwloc_topology_t topology, hwloc_topology_diff_t diff, const char *refname, char **xmlbuffer, int *buflen); + +/** @} */ + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +#endif /* HWLOC_HELPER_H */ diff --git a/ext/hwloc/include/hwloc/gl.h b/ext/hwloc/include/hwloc/gl.h new file mode 100644 index 000000000..4b8b3f230 --- /dev/null +++ b/ext/hwloc/include/hwloc/gl.h @@ -0,0 +1,135 @@ +/* + * Copyright © 2012 Blue Brain Project, EPFL. All rights reserved. + * Copyright © 2012-2013 Inria. All rights reserved. + * See COPYING in top-level directory. + */ + +/** \file + * \brief Macros to help interaction between hwloc and OpenGL displays. + * + * Applications that use both hwloc and OpenGL may want to include + * this file so as to get topology information for OpenGL displays. + */ + +#ifndef HWLOC_GL_H +#define HWLOC_GL_H + +#include + +#include +#include + + +#ifdef __cplusplus +extern "C" { +#endif + + +/** \defgroup hwlocality_gl Interoperability with OpenGL displays + * + * This interface offers ways to retrieve topology information about + * OpenGL displays. + * + * Only the NVIDIA display locality information is currently available, + * using the NV-CONTROL X11 extension and the NVCtrl library. + * + * @{ + */ + +/** \brief Get the hwloc OS device object corresponding to the + * OpenGL display given by port and device index. + * + * Return the OS device object describing the OpenGL display + * whose port (server) is \p port and device (screen) is \p device. + * Return NULL if there is none. + * + * The topology \p topology does not necessarily have to match the current + * machine. For instance the topology may be an XML import of a remote host. + * I/O devices detection and the GL component must be enabled in the topology. + * + * \note The corresponding PCI device object can be obtained by looking + * at the OS device parent object. + */ +static __hwloc_inline hwloc_obj_t +hwloc_gl_get_display_osdev_by_port_device(hwloc_topology_t topology, + unsigned port, unsigned device) +{ + unsigned x = (unsigned) -1, y = (unsigned) -1; + hwloc_obj_t osdev = NULL; + while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { + if (HWLOC_OBJ_OSDEV_GPU == osdev->attr->osdev.type + && osdev->name + && sscanf(osdev->name, ":%u.%u", &x, &y) == 2 + && port == x && device == y) + return osdev; + } + errno = EINVAL; + return NULL; +} + +/** \brief Get the hwloc OS device object corresponding to the + * OpenGL display given by name. + * + * Return the OS device object describing the OpenGL display + * whose name is \p name, built as ":port.device" such as ":0.0" . + * Return NULL if there is none. + * + * The topology \p topology does not necessarily have to match the current + * machine. For instance the topology may be an XML import of a remote host. + * I/O devices detection and the GL component must be enabled in the topology. + * + * \note The corresponding PCI device object can be obtained by looking + * at the OS device parent object. + */ +static __hwloc_inline hwloc_obj_t +hwloc_gl_get_display_osdev_by_name(hwloc_topology_t topology, + const char *name) +{ + hwloc_obj_t osdev = NULL; + while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { + if (HWLOC_OBJ_OSDEV_GPU == osdev->attr->osdev.type + && osdev->name + && !strcmp(name, osdev->name)) + return osdev; + } + errno = EINVAL; + return NULL; +} + +/** \brief Get the OpenGL display port and device corresponding + * to the given hwloc OS object. + * + * Return the OpenGL display port (server) in \p port and device (screen) + * in \p screen that correspond to the given hwloc OS device object. + * Return \c -1 if there is none. + * + * The topology \p topology does not necessarily have to match the current + * machine. For instance the topology may be an XML import of a remote host. + * I/O devices detection and the GL component must be enabled in the topology. + */ +static __hwloc_inline int +hwloc_gl_get_display_by_osdev(hwloc_topology_t topology __hwloc_attribute_unused, + hwloc_obj_t osdev, + unsigned *port, unsigned *device) +{ + unsigned x = -1, y = -1; + if (HWLOC_OBJ_OSDEV_GPU == osdev->attr->osdev.type + && sscanf(osdev->name, ":%u.%u", &x, &y) == 2) { + *port = x; + *device = y; + return 0; + } + errno = EINVAL; + return -1; +} + +/** @} */ + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +#endif /* HWLOC_GL_H */ + diff --git a/ext/hwloc/include/hwloc/glibc-sched.h b/ext/hwloc/include/hwloc/glibc-sched.h new file mode 100644 index 000000000..58926ff11 --- /dev/null +++ b/ext/hwloc/include/hwloc/glibc-sched.h @@ -0,0 +1,125 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2013 inria. All rights reserved. + * Copyright © 2009-2011 Université Bordeaux 1 + * Copyright © 2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +/** \file + * \brief Macros to help interaction between hwloc and glibc scheduling routines. + * + * Applications that use both hwloc and glibc scheduling routines such as + * sched_getaffinity() or pthread_attr_setaffinity_np() may want to include + * this file so as to ease conversion between their respective types. + */ + +#ifndef HWLOC_GLIBC_SCHED_H +#define HWLOC_GLIBC_SCHED_H + +#include +#include +#include + +#if !defined _GNU_SOURCE || !defined _SCHED_H || (!defined CPU_SETSIZE && !defined sched_priority) +#error Please make sure to include sched.h before including glibc-sched.h, and define _GNU_SOURCE before any inclusion of sched.h +#endif + + +#ifdef __cplusplus +extern "C" { +#endif + + +#ifdef HWLOC_HAVE_CPU_SET + + +/** \defgroup hwlocality_glibc_sched Interoperability with glibc sched affinity + * + * This interface offers ways to convert between hwloc cpusets and glibc cpusets + * such as those manipulated by sched_getaffinity() or pthread_attr_setaffinity_np(). + * + * \note Topology \p topology must match the current machine. + * + * @{ + */ + + +/** \brief Convert hwloc CPU set \p toposet into glibc sched affinity CPU set \p schedset + * + * This function may be used before calling sched_setaffinity or any other function + * that takes a cpu_set_t as input parameter. + * + * \p schedsetsize should be sizeof(cpu_set_t) unless \p schedset was dynamically allocated with CPU_ALLOC + */ +static __hwloc_inline int +hwloc_cpuset_to_glibc_sched_affinity(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_const_cpuset_t hwlocset, + cpu_set_t *schedset, size_t schedsetsize) +{ +#ifdef CPU_ZERO_S + unsigned cpu; + CPU_ZERO_S(schedsetsize, schedset); + hwloc_bitmap_foreach_begin(cpu, hwlocset) + CPU_SET_S(cpu, schedsetsize, schedset); + hwloc_bitmap_foreach_end(); +#else /* !CPU_ZERO_S */ + unsigned cpu; + CPU_ZERO(schedset); + assert(schedsetsize == sizeof(cpu_set_t)); + hwloc_bitmap_foreach_begin(cpu, hwlocset) + CPU_SET(cpu, schedset); + hwloc_bitmap_foreach_end(); +#endif /* !CPU_ZERO_S */ + return 0; +} + +/** \brief Convert glibc sched affinity CPU set \p schedset into hwloc CPU set + * + * This function may be used before calling sched_setaffinity or any other function + * that takes a cpu_set_t as input parameter. + * + * \p schedsetsize should be sizeof(cpu_set_t) unless \p schedset was dynamically allocated with CPU_ALLOC + */ +static __hwloc_inline int +hwloc_cpuset_from_glibc_sched_affinity(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_cpuset_t hwlocset, + const cpu_set_t *schedset, size_t schedsetsize) +{ + int cpu; +#ifdef CPU_ZERO_S + int count; +#endif + hwloc_bitmap_zero(hwlocset); +#ifdef CPU_ZERO_S + count = CPU_COUNT_S(schedsetsize, schedset); + cpu = 0; + while (count) { + if (CPU_ISSET_S(cpu, schedsetsize, schedset)) { + hwloc_bitmap_set(hwlocset, cpu); + count--; + } + cpu++; + } +#else /* !CPU_ZERO_S */ + /* sched.h does not support dynamic cpu_set_t (introduced in glibc 2.7), + * assume we have a very old interface without CPU_COUNT (added in 2.6) + */ + assert(schedsetsize == sizeof(cpu_set_t)); + for(cpu=0; cpu +#include + + +#ifdef __cplusplus +extern "C" { +#endif + + +/** \defgroup hwlocality_helper_find_inside Finding Objects inside a CPU set + * @{ + */ + +/** \brief Get the first largest object included in the given cpuset \p set. + * + * \return the first object that is included in \p set and whose parent is not. + * + * This is convenient for iterating over all largest objects within a CPU set + * by doing a loop getting the first largest object and clearing its CPU set + * from the remaining CPU set. + * + * \note This function cannot work if the root object does not have a CPU set, + * e.g. if the topology is made of different machines. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_first_largest_obj_inside_cpuset(hwloc_topology_t topology, hwloc_const_cpuset_t set) +{ + hwloc_obj_t obj = hwloc_get_root_obj(topology); + if (!obj->cpuset || !hwloc_bitmap_intersects(obj->cpuset, set)) + return NULL; + while (!hwloc_bitmap_isincluded(obj->cpuset, set)) { + /* while the object intersects without being included, look at its children */ + hwloc_obj_t child = obj->first_child; + while (child) { + if (child->cpuset && hwloc_bitmap_intersects(child->cpuset, set)) + break; + child = child->next_sibling; + } + if (!child) + /* no child intersects, return their father */ + return obj; + /* found one intersecting child, look at its children */ + obj = child; + } + /* obj is included, return it */ + return obj; +} + +/** \brief Get the set of largest objects covering exactly a given cpuset \p set + * + * \return the number of objects returned in \p objs. + * + * \note This function cannot work if the root object does not have a CPU set, + * e.g. if the topology is made of different machines. + */ +HWLOC_DECLSPEC int hwloc_get_largest_objs_inside_cpuset (hwloc_topology_t topology, hwloc_const_cpuset_t set, + hwloc_obj_t * __hwloc_restrict objs, int max); + +/** \brief Return the next object at depth \p depth included in CPU set \p set. + * + * If \p prev is \c NULL, return the first object at depth \p depth + * included in \p set. The next invokation should pass the previous + * return value in \p prev so as to obtain the next object in \p set. + * + * \note This function cannot work if objects at the given depth do + * not have CPU sets or if the topology is made of different machines. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_next_obj_inside_cpuset_by_depth (hwloc_topology_t topology, hwloc_const_cpuset_t set, + unsigned depth, hwloc_obj_t prev) +{ + hwloc_obj_t next = hwloc_get_next_obj_by_depth(topology, depth, prev); + if (!next || !next->cpuset) + return NULL; + while (next && !hwloc_bitmap_isincluded(next->cpuset, set)) + next = next->next_cousin; + return next; +} + +/** \brief Return the next object of type \p type included in CPU set \p set. + * + * If there are multiple or no depth for given type, return \c NULL + * and let the caller fallback to + * hwloc_get_next_obj_inside_cpuset_by_depth(). + * + * \note This function cannot work if objects of the given type do + * not have CPU sets or if the topology is made of different machines. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_next_obj_inside_cpuset_by_type (hwloc_topology_t topology, hwloc_const_cpuset_t set, + hwloc_obj_type_t type, hwloc_obj_t prev) +{ + int depth = hwloc_get_type_depth(topology, type); + if (depth == HWLOC_TYPE_DEPTH_UNKNOWN || depth == HWLOC_TYPE_DEPTH_MULTIPLE) + return NULL; + return hwloc_get_next_obj_inside_cpuset_by_depth(topology, set, depth, prev); +} + +/** \brief Return the (logically) \p idx -th object at depth \p depth included in CPU set \p set. + * + * \note This function cannot work if objects at the given depth do + * not have CPU sets or if the topology is made of different machines. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_obj_inside_cpuset_by_depth (hwloc_topology_t topology, hwloc_const_cpuset_t set, + unsigned depth, unsigned idx) __hwloc_attribute_pure; +static __hwloc_inline hwloc_obj_t +hwloc_get_obj_inside_cpuset_by_depth (hwloc_topology_t topology, hwloc_const_cpuset_t set, + unsigned depth, unsigned idx) +{ + hwloc_obj_t obj = hwloc_get_obj_by_depth (topology, depth, 0); + unsigned count = 0; + if (!obj || !obj->cpuset) + return NULL; + while (obj) { + if (hwloc_bitmap_isincluded(obj->cpuset, set)) { + if (count == idx) + return obj; + count++; + } + obj = obj->next_cousin; + } + return NULL; +} + +/** \brief Return the \p idx -th object of type \p type included in CPU set \p set. + * + * If there are multiple or no depth for given type, return \c NULL + * and let the caller fallback to + * hwloc_get_obj_inside_cpuset_by_depth(). + * + * \note This function cannot work if objects of the given type do + * not have CPU sets or if the topology is made of different machines. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_obj_inside_cpuset_by_type (hwloc_topology_t topology, hwloc_const_cpuset_t set, + hwloc_obj_type_t type, unsigned idx) __hwloc_attribute_pure; +static __hwloc_inline hwloc_obj_t +hwloc_get_obj_inside_cpuset_by_type (hwloc_topology_t topology, hwloc_const_cpuset_t set, + hwloc_obj_type_t type, unsigned idx) +{ + int depth = hwloc_get_type_depth(topology, type); + if (depth == HWLOC_TYPE_DEPTH_UNKNOWN || depth == HWLOC_TYPE_DEPTH_MULTIPLE) + return NULL; + return hwloc_get_obj_inside_cpuset_by_depth(topology, set, depth, idx); +} + +/** \brief Return the number of objects at depth \p depth included in CPU set \p set. + * + * \note This function cannot work if objects at the given depth do + * not have CPU sets or if the topology is made of different machines. + */ +static __hwloc_inline unsigned +hwloc_get_nbobjs_inside_cpuset_by_depth (hwloc_topology_t topology, hwloc_const_cpuset_t set, + unsigned depth) __hwloc_attribute_pure; +static __hwloc_inline unsigned +hwloc_get_nbobjs_inside_cpuset_by_depth (hwloc_topology_t topology, hwloc_const_cpuset_t set, + unsigned depth) +{ + hwloc_obj_t obj = hwloc_get_obj_by_depth (topology, depth, 0); + unsigned count = 0; + if (!obj || !obj->cpuset) + return 0; + while (obj) { + if (hwloc_bitmap_isincluded(obj->cpuset, set)) + count++; + obj = obj->next_cousin; + } + return count; +} + +/** \brief Return the number of objects of type \p type included in CPU set \p set. + * + * If no object for that type exists inside CPU set \p set, 0 is + * returned. If there are several levels with objects of that type + * inside CPU set \p set, -1 is returned. + * + * \note This function cannot work if objects of the given type do + * not have CPU sets or if the topology is made of different machines. + */ +static __hwloc_inline int +hwloc_get_nbobjs_inside_cpuset_by_type (hwloc_topology_t topology, hwloc_const_cpuset_t set, + hwloc_obj_type_t type) __hwloc_attribute_pure; +static __hwloc_inline int +hwloc_get_nbobjs_inside_cpuset_by_type (hwloc_topology_t topology, hwloc_const_cpuset_t set, + hwloc_obj_type_t type) +{ + int depth = hwloc_get_type_depth(topology, type); + if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) + return 0; + if (depth == HWLOC_TYPE_DEPTH_MULTIPLE) + return -1; /* FIXME: agregate nbobjs from different levels? */ + return hwloc_get_nbobjs_inside_cpuset_by_depth(topology, set, depth); +} + +/** \brief Return the logical index among the objects included in CPU set \p set. + * + * Consult all objects in the same level as \p obj and inside CPU set \p set + * in the logical order, and return the index of \p obj within them. + * If \p set covers the entire topology, this is the logical index of \p obj. + * Otherwise, this is similar to a logical index within the part of the topology + * defined by CPU set \p set. + */ +static __hwloc_inline int +hwloc_get_obj_index_inside_cpuset (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_const_cpuset_t set, + hwloc_obj_t obj) __hwloc_attribute_pure; +static __hwloc_inline int +hwloc_get_obj_index_inside_cpuset (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_const_cpuset_t set, + hwloc_obj_t obj) +{ + int idx = 0; + if (!hwloc_bitmap_isincluded(obj->cpuset, set)) + return -1; + /* count how many objects are inside the cpuset on the way from us to the beginning of the level */ + while ((obj = obj->prev_cousin) != NULL) + if (hwloc_bitmap_isincluded(obj->cpuset, set)) + idx++; + return idx; +} + +/** @} */ + + + +/** \defgroup hwlocality_helper_find_covering Finding Objects covering at least CPU set + * @{ + */ + +/** \brief Get the child covering at least CPU set \p set. + * + * \return \c NULL if no child matches or if \p set is empty. + * + * \note This function cannot work if parent does not have a CPU set. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_child_covering_cpuset (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_const_cpuset_t set, + hwloc_obj_t parent) __hwloc_attribute_pure; +static __hwloc_inline hwloc_obj_t +hwloc_get_child_covering_cpuset (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_const_cpuset_t set, + hwloc_obj_t parent) +{ + hwloc_obj_t child; + if (!parent->cpuset || hwloc_bitmap_iszero(set)) + return NULL; + child = parent->first_child; + while (child) { + if (child->cpuset && hwloc_bitmap_isincluded(set, child->cpuset)) + return child; + child = child->next_sibling; + } + return NULL; +} + +/** \brief Get the lowest object covering at least CPU set \p set + * + * \return \c NULL if no object matches or if \p set is empty. + * + * \note This function cannot work if the root object does not have a CPU set, + * e.g. if the topology is made of different machines. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_obj_covering_cpuset (hwloc_topology_t topology, hwloc_const_cpuset_t set) __hwloc_attribute_pure; +static __hwloc_inline hwloc_obj_t +hwloc_get_obj_covering_cpuset (hwloc_topology_t topology, hwloc_const_cpuset_t set) +{ + struct hwloc_obj *current = hwloc_get_root_obj(topology); + if (hwloc_bitmap_iszero(set) || !current->cpuset || !hwloc_bitmap_isincluded(set, current->cpuset)) + return NULL; + while (1) { + hwloc_obj_t child = hwloc_get_child_covering_cpuset(topology, set, current); + if (!child) + return current; + current = child; + } +} + +/** \brief Iterate through same-depth objects covering at least CPU set \p set + * + * If object \p prev is \c NULL, return the first object at depth \p + * depth covering at least part of CPU set \p set. The next + * invokation should pass the previous return value in \p prev so as + * to obtain the next object covering at least another part of \p set. + * + * \note This function cannot work if objects at the given depth do + * not have CPU sets or if the topology is made of different machines. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_next_obj_covering_cpuset_by_depth(hwloc_topology_t topology, hwloc_const_cpuset_t set, + unsigned depth, hwloc_obj_t prev) +{ + hwloc_obj_t next = hwloc_get_next_obj_by_depth(topology, depth, prev); + if (!next || !next->cpuset) + return NULL; + while (next && !hwloc_bitmap_intersects(set, next->cpuset)) + next = next->next_cousin; + return next; +} + +/** \brief Iterate through same-type objects covering at least CPU set \p set + * + * If object \p prev is \c NULL, return the first object of type \p + * type covering at least part of CPU set \p set. The next invokation + * should pass the previous return value in \p prev so as to obtain + * the next object of type \p type covering at least another part of + * \p set. + * + * If there are no or multiple depths for type \p type, \c NULL is returned. + * The caller may fallback to hwloc_get_next_obj_covering_cpuset_by_depth() + * for each depth. + * + * \note This function cannot work if objects of the given type do + * not have CPU sets or if the topology is made of different machines. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_next_obj_covering_cpuset_by_type(hwloc_topology_t topology, hwloc_const_cpuset_t set, + hwloc_obj_type_t type, hwloc_obj_t prev) +{ + int depth = hwloc_get_type_depth(topology, type); + if (depth == HWLOC_TYPE_DEPTH_UNKNOWN || depth == HWLOC_TYPE_DEPTH_MULTIPLE) + return NULL; + return hwloc_get_next_obj_covering_cpuset_by_depth(topology, set, depth, prev); +} + +/** @} */ + + + +/** \defgroup hwlocality_helper_ancestors Looking at Ancestor and Child Objects + * @{ + * + * Be sure to see the figure in \ref termsanddefs that shows a + * complete topology tree, including depths, child/sibling/cousin + * relationships, and an example of an asymmetric topology where one + * socket has fewer caches than its peers. + */ + +/** \brief Returns the ancestor object of \p obj at depth \p depth. */ +static __hwloc_inline hwloc_obj_t +hwloc_get_ancestor_obj_by_depth (hwloc_topology_t topology __hwloc_attribute_unused, unsigned depth, hwloc_obj_t obj) __hwloc_attribute_pure; +static __hwloc_inline hwloc_obj_t +hwloc_get_ancestor_obj_by_depth (hwloc_topology_t topology __hwloc_attribute_unused, unsigned depth, hwloc_obj_t obj) +{ + hwloc_obj_t ancestor = obj; + if (obj->depth < depth) + return NULL; + while (ancestor && ancestor->depth > depth) + ancestor = ancestor->parent; + return ancestor; +} + +/** \brief Returns the ancestor object of \p obj with type \p type. */ +static __hwloc_inline hwloc_obj_t +hwloc_get_ancestor_obj_by_type (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_type_t type, hwloc_obj_t obj) __hwloc_attribute_pure; +static __hwloc_inline hwloc_obj_t +hwloc_get_ancestor_obj_by_type (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_type_t type, hwloc_obj_t obj) +{ + hwloc_obj_t ancestor = obj->parent; + while (ancestor && ancestor->type != type) + ancestor = ancestor->parent; + return ancestor; +} + +/** \brief Returns the common parent object to objects lvl1 and lvl2 */ +static __hwloc_inline hwloc_obj_t +hwloc_get_common_ancestor_obj (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t obj1, hwloc_obj_t obj2) __hwloc_attribute_pure; +static __hwloc_inline hwloc_obj_t +hwloc_get_common_ancestor_obj (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t obj1, hwloc_obj_t obj2) +{ + /* the loop isn't so easy since intermediate ancestors may have + * different depth, causing us to alternate between using obj1->parent + * and obj2->parent. Also, even if at some point we find ancestors of + * of the same depth, their ancestors may have different depth again. + */ + while (obj1 != obj2) { + while (obj1->depth > obj2->depth) + obj1 = obj1->parent; + while (obj2->depth > obj1->depth) + obj2 = obj2->parent; + if (obj1 != obj2 && obj1->depth == obj2->depth) { + obj1 = obj1->parent; + obj2 = obj2->parent; + } + } + return obj1; +} + +/** \brief Returns true if \p obj is inside the subtree beginning with ancestor object \p subtree_root. + * + * \note This function assumes that both \p obj and \p subtree_root have a \p cpuset. + */ +static __hwloc_inline int +hwloc_obj_is_in_subtree (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t obj, hwloc_obj_t subtree_root) __hwloc_attribute_pure; +static __hwloc_inline int +hwloc_obj_is_in_subtree (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t obj, hwloc_obj_t subtree_root) +{ + return hwloc_bitmap_isincluded(obj->cpuset, subtree_root->cpuset); +} + +/** \brief Return the next child. + * + * If \p prev is \c NULL, return the first child. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_next_child (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t parent, hwloc_obj_t prev) +{ + if (!prev) + return parent->first_child; + if (prev->parent != parent) + return NULL; + return prev->next_sibling; +} + +/** @} */ + + + +/** \defgroup hwlocality_helper_find_cache Looking at Cache Objects + * @{ + */ + +/** \brief Find the depth of cache objects matching cache depth and type. + * + * Return the depth of the topology level that contains cache objects + * whose attributes match \p cachedepth and \p cachetype. This function + * intends to disambiguate the case where hwloc_get_type_depth() returns + * \p HWLOC_TYPE_DEPTH_MULTIPLE. + * + * If no cache level matches, \p HWLOC_TYPE_DEPTH_UNKNOWN is returned. + * + * If \p cachetype is \p HWLOC_OBJ_CACHE_UNIFIED, the depth of the + * unique matching unified cache level is returned. + * + * If \p cachetype is \p HWLOC_OBJ_CACHE_DATA or \p HWLOC_OBJ_CACHE_INSTRUCTION, + * either a matching cache, or a unified cache is returned. + * + * If \p cachetype is \c -1, it is ignored and multiple levels may + * match. The function returns either the depth of a uniquely matching + * level or \p HWLOC_TYPE_DEPTH_MULTIPLE. + */ +static __hwloc_inline int +hwloc_get_cache_type_depth (hwloc_topology_t topology, + unsigned cachelevel, hwloc_obj_cache_type_t cachetype) +{ + int depth; + int found = HWLOC_TYPE_DEPTH_UNKNOWN; + for (depth=0; ; depth++) { + hwloc_obj_t obj = hwloc_get_obj_by_depth(topology, depth, 0); + if (!obj) + break; + if (obj->type != HWLOC_OBJ_CACHE || obj->attr->cache.depth != cachelevel) + /* doesn't match, try next depth */ + continue; + if (cachetype == (hwloc_obj_cache_type_t) -1) { + if (found != HWLOC_TYPE_DEPTH_UNKNOWN) { + /* second match, return MULTIPLE */ + return HWLOC_TYPE_DEPTH_MULTIPLE; + } + /* first match, mark it as found */ + found = depth; + continue; + } + if (obj->attr->cache.type == cachetype || obj->attr->cache.type == HWLOC_OBJ_CACHE_UNIFIED) + /* exact match (either unified is alone, or we match instruction or data), return immediately */ + return depth; + } + /* went to the bottom, return what we found */ + return found; +} + +/** \brief Get the first cache covering a cpuset \p set + * + * \return \c NULL if no cache matches. + * + * \note This function cannot work if the root object does not have a CPU set, + * e.g. if the topology is made of different machines. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_cache_covering_cpuset (hwloc_topology_t topology, hwloc_const_cpuset_t set) __hwloc_attribute_pure; +static __hwloc_inline hwloc_obj_t +hwloc_get_cache_covering_cpuset (hwloc_topology_t topology, hwloc_const_cpuset_t set) +{ + hwloc_obj_t current = hwloc_get_obj_covering_cpuset(topology, set); + while (current) { + if (current->type == HWLOC_OBJ_CACHE) + return current; + current = current->parent; + } + return NULL; +} + +/** \brief Get the first cache shared between an object and somebody else. + * + * \return \c NULL if no cache matches or if an invalid object is given. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_shared_cache_covering_obj (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t obj) __hwloc_attribute_pure; +static __hwloc_inline hwloc_obj_t +hwloc_get_shared_cache_covering_obj (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t obj) +{ + hwloc_obj_t current = obj->parent; + if (!obj->cpuset) + return NULL; + while (current && current->cpuset) { + if (!hwloc_bitmap_isequal(current->cpuset, obj->cpuset) + && current->type == HWLOC_OBJ_CACHE) + return current; + current = current->parent; + } + return NULL; +} + +/** @} */ + + + +/** \defgroup hwlocality_helper_find_misc Finding objects, miscellaneous helpers + * @{ + * + * Be sure to see the figure in \ref termsanddefs that shows a + * complete topology tree, including depths, child/sibling/cousin + * relationships, and an example of an asymmetric topology where one + * socket has fewer caches than its peers. + */ + +/** \brief Returns the object of type ::HWLOC_OBJ_PU with \p os_index. + * + * \note The \p os_index field of object should most of the times only be + * used for pretty-printing purpose. Type ::HWLOC_OBJ_PU is the only case + * where \p os_index could actually be useful, when manually binding to + * processors. + * However, using CPU sets to hide this complexity should often be preferred. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_pu_obj_by_os_index(hwloc_topology_t topology, unsigned os_index) __hwloc_attribute_pure; +static __hwloc_inline hwloc_obj_t +hwloc_get_pu_obj_by_os_index(hwloc_topology_t topology, unsigned os_index) +{ + hwloc_obj_t obj = NULL; + while ((obj = hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_PU, obj)) != NULL) + if (obj->os_index == os_index) + return obj; + return NULL; +} + +/** \brief Do a depth-first traversal of the topology to find and sort + * + * all objects that are at the same depth than \p src. + * Report in \p objs up to \p max physically closest ones to \p src. + * + * \return the number of objects returned in \p objs. + * + * \return 0 if \p src is an I/O object. + * + * \note This function requires the \p src object to have a CPU set. + */ +/* TODO: rather provide an iterator? Provide a way to know how much should be allocated? By returning the total number of objects instead? */ +HWLOC_DECLSPEC unsigned hwloc_get_closest_objs (hwloc_topology_t topology, hwloc_obj_t src, hwloc_obj_t * __hwloc_restrict objs, unsigned max); + +/** \brief Find an object below another object, both specified by types and indexes. + * + * Start from the top system object and find object of type \p type1 + * and logical index \p idx1. Then look below this object and find another + * object of type \p type2 and logical index \p idx2. Indexes are specified + * within the parent, not withing the entire system. + * + * For instance, if type1 is SOCKET, idx1 is 2, type2 is CORE and idx2 + * is 3, return the fourth core object below the third socket. + * + * \note This function requires these objects to have a CPU set. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_obj_below_by_type (hwloc_topology_t topology, + hwloc_obj_type_t type1, unsigned idx1, + hwloc_obj_type_t type2, unsigned idx2) __hwloc_attribute_pure; +static __hwloc_inline hwloc_obj_t +hwloc_get_obj_below_by_type (hwloc_topology_t topology, + hwloc_obj_type_t type1, unsigned idx1, + hwloc_obj_type_t type2, unsigned idx2) +{ + hwloc_obj_t obj; + obj = hwloc_get_obj_by_type (topology, type1, idx1); + if (!obj || !obj->cpuset) + return NULL; + return hwloc_get_obj_inside_cpuset_by_type(topology, obj->cpuset, type2, idx2); +} + +/** \brief Find an object below a chain of objects specified by types and indexes. + * + * This is a generalized version of hwloc_get_obj_below_by_type(). + * + * Arrays \p typev and \p idxv must contain \p nr types and indexes. + * + * Start from the top system object and walk the arrays \p typev and \p idxv. + * For each type and logical index couple in the arrays, look under the previously found + * object to find the index-th object of the given type. + * Indexes are specified within the parent, not withing the entire system. + * + * For instance, if nr is 3, typev contains NODE, SOCKET and CORE, + * and idxv contains 0, 1 and 2, return the third core object below + * the second socket below the first NUMA node. + * + * \note This function requires all these objects and the root object + * to have a CPU set. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_obj_below_array_by_type (hwloc_topology_t topology, int nr, hwloc_obj_type_t *typev, unsigned *idxv) __hwloc_attribute_pure; +static __hwloc_inline hwloc_obj_t +hwloc_get_obj_below_array_by_type (hwloc_topology_t topology, int nr, hwloc_obj_type_t *typev, unsigned *idxv) +{ + hwloc_obj_t obj = hwloc_get_root_obj(topology); + int i; + for(i=0; icpuset) + return NULL; + obj = hwloc_get_obj_inside_cpuset_by_type(topology, obj->cpuset, typev[i], idxv[i]); + } + return obj; +} + +/** @} */ + + + +/** \defgroup hwlocality_helper_distribute Distributing items over a topology + * @{ + */ + +/** \brief Distribute \p n items over the topology under \p root + * + * Array \p cpuset will be filled with \p n cpusets recursively distributed + * linearly over the topology under \p root, down to depth \p until (which can + * be INT_MAX to distribute down to the finest level). + * + * This is typically useful when an application wants to distribute \p n + * threads over a machine, giving each of them as much private cache as + * possible and keeping them locally in number order. + * + * The caller may typically want to also call hwloc_bitmap_singlify() + * before binding a thread so that it does not move at all. + * + * \note This function requires the \p root object to have a CPU set. + */ +static __hwloc_inline void +hwloc_distributev(hwloc_topology_t topology, hwloc_obj_t *root, unsigned n_roots, hwloc_cpuset_t *cpuset, unsigned n, unsigned until); +static __hwloc_inline void +hwloc_distribute(hwloc_topology_t topology, hwloc_obj_t root, hwloc_cpuset_t *set, unsigned n, unsigned until) +{ + unsigned i; + if (!root->arity || n == 1 || root->depth >= until) { + /* Got to the bottom, we can't split any more, put everything there. */ + for (i=0; icpuset); + return; + } + hwloc_distributev(topology, root->children, root->arity, set, n, until); +} + +/** \brief Distribute \p n items over the topology under \p roots + * + * This is the same as hwloc_distribute, but takes an array of roots instead of + * just one root. + * + * \note This function requires the \p roots objects to have a CPU set. + */ +static __hwloc_inline void +hwloc_distributev(hwloc_topology_t topology, hwloc_obj_t *roots, unsigned n_roots, hwloc_cpuset_t *set, unsigned n, unsigned until) +{ + unsigned i; + unsigned tot_weight; + hwloc_cpuset_t *cpusetp = set; + + tot_weight = 0; + for (i = 0; i < n_roots; i++) + if (roots[i]->cpuset) + tot_weight += hwloc_bitmap_weight(roots[i]->cpuset); + + for (i = 0; i < n_roots && tot_weight; i++) { + /* Give to roots[i] a portion proportional to its weight */ + unsigned weight = roots[i]->cpuset ? hwloc_bitmap_weight(roots[i]->cpuset) : 0; + unsigned chunk = (n * weight + tot_weight-1) / tot_weight; + hwloc_distribute(topology, roots[i], cpusetp, chunk, until); + cpusetp += chunk; + tot_weight -= weight; + n -= chunk; + } +} + +/** @} */ + + + +/** \defgroup hwlocality_helper_topology_sets CPU and node sets of entire topologies + * @{ + */ +/** \brief Get complete CPU set + * + * \return the complete CPU set of logical processors of the system. If the + * topology is the result of a combination of several systems, NULL is + * returned. + * + * \note The returned cpuset is not newly allocated and should thus not be + * changed or freed; hwloc_cpuset_dup must be used to obtain a local copy. + */ +static __hwloc_inline hwloc_const_cpuset_t +hwloc_topology_get_complete_cpuset(hwloc_topology_t topology) __hwloc_attribute_pure; +static __hwloc_inline hwloc_const_cpuset_t +hwloc_topology_get_complete_cpuset(hwloc_topology_t topology) +{ + return hwloc_get_root_obj(topology)->complete_cpuset; +} + +/** \brief Get topology CPU set + * + * \return the CPU set of logical processors of the system for which hwloc + * provides topology information. This is equivalent to the cpuset of the + * system object. If the topology is the result of a combination of several + * systems, NULL is returned. + * + * \note The returned cpuset is not newly allocated and should thus not be + * changed or freed; hwloc_cpuset_dup must be used to obtain a local copy. + */ +static __hwloc_inline hwloc_const_cpuset_t +hwloc_topology_get_topology_cpuset(hwloc_topology_t topology) __hwloc_attribute_pure; +static __hwloc_inline hwloc_const_cpuset_t +hwloc_topology_get_topology_cpuset(hwloc_topology_t topology) +{ + return hwloc_get_root_obj(topology)->cpuset; +} + +/** \brief Get online CPU set + * + * \return the CPU set of online logical processors of the system. If the + * topology is the result of a combination of several systems, NULL is + * returned. + * + * \note The returned cpuset is not newly allocated and should thus not be + * changed or freed; hwloc_cpuset_dup must be used to obtain a local copy. + */ +static __hwloc_inline hwloc_const_cpuset_t +hwloc_topology_get_online_cpuset(hwloc_topology_t topology) __hwloc_attribute_pure; +static __hwloc_inline hwloc_const_cpuset_t +hwloc_topology_get_online_cpuset(hwloc_topology_t topology) +{ + return hwloc_get_root_obj(topology)->online_cpuset; +} + +/** \brief Get allowed CPU set + * + * \return the CPU set of allowed logical processors of the system. If the + * topology is the result of a combination of several systems, NULL is + * returned. + * + * \note The returned cpuset is not newly allocated and should thus not be + * changed or freed, hwloc_cpuset_dup must be used to obtain a local copy. + */ +static __hwloc_inline hwloc_const_cpuset_t +hwloc_topology_get_allowed_cpuset(hwloc_topology_t topology) __hwloc_attribute_pure; +static __hwloc_inline hwloc_const_cpuset_t +hwloc_topology_get_allowed_cpuset(hwloc_topology_t topology) +{ + return hwloc_get_root_obj(topology)->allowed_cpuset; +} + +/** \brief Get complete node set + * + * \return the complete node set of memory of the system. If the + * topology is the result of a combination of several systems, NULL is + * returned. + * + * \note The returned nodeset is not newly allocated and should thus not be + * changed or freed; hwloc_nodeset_dup must be used to obtain a local copy. + */ +static __hwloc_inline hwloc_const_nodeset_t +hwloc_topology_get_complete_nodeset(hwloc_topology_t topology) __hwloc_attribute_pure; +static __hwloc_inline hwloc_const_nodeset_t +hwloc_topology_get_complete_nodeset(hwloc_topology_t topology) +{ + return hwloc_get_root_obj(topology)->complete_nodeset; +} + +/** \brief Get topology node set + * + * \return the node set of memory of the system for which hwloc + * provides topology information. This is equivalent to the nodeset of the + * system object. If the topology is the result of a combination of several + * systems, NULL is returned. + * + * \note The returned nodeset is not newly allocated and should thus not be + * changed or freed; hwloc_nodeset_dup must be used to obtain a local copy. + */ +static __hwloc_inline hwloc_const_nodeset_t +hwloc_topology_get_topology_nodeset(hwloc_topology_t topology) __hwloc_attribute_pure; +static __hwloc_inline hwloc_const_nodeset_t +hwloc_topology_get_topology_nodeset(hwloc_topology_t topology) +{ + return hwloc_get_root_obj(topology)->nodeset; +} + +/** \brief Get allowed node set + * + * \return the node set of allowed memory of the system. If the + * topology is the result of a combination of several systems, NULL is + * returned. + * + * \note The returned nodeset is not newly allocated and should thus not be + * changed or freed, hwloc_nodeset_dup must be used to obtain a local copy. + */ +static __hwloc_inline hwloc_const_nodeset_t +hwloc_topology_get_allowed_nodeset(hwloc_topology_t topology) __hwloc_attribute_pure; +static __hwloc_inline hwloc_const_nodeset_t +hwloc_topology_get_allowed_nodeset(hwloc_topology_t topology) +{ + return hwloc_get_root_obj(topology)->allowed_nodeset; +} + +/** @} */ + + + +/** \defgroup hwlocality_helper_nodeset_convert Converting between CPU sets and node sets + * + * There are two semantics for converting cpusets to nodesets depending on how + * non-NUMA machines are handled. + * + * When manipulating nodesets for memory binding, non-NUMA machines should be + * considered as having a single NUMA node. The standard conversion routines + * below should be used so that marking the first bit of the nodeset means + * that memory should be bound to a non-NUMA whole machine. + * + * When manipulating nodesets as an actual list of NUMA nodes without any + * need to handle memory binding on non-NUMA machines, the strict conversion + * routines may be used instead. + * @{ + */ + +/** \brief Convert a CPU set into a NUMA node set and handle non-NUMA cases + * + * If some NUMA nodes have no CPUs at all, this function never sets their + * indexes in the output node set, even if a full CPU set is given in input. + * + * If the topology contains no NUMA nodes, the machine is considered + * as a single memory node, and the following behavior is used: + * If \p cpuset is empty, \p nodeset will be emptied as well. + * Otherwise \p nodeset will be entirely filled. + */ +static __hwloc_inline void +hwloc_cpuset_to_nodeset(hwloc_topology_t topology, hwloc_const_cpuset_t _cpuset, hwloc_nodeset_t nodeset) +{ + int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NODE); + hwloc_obj_t obj; + + if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) { + if (hwloc_bitmap_iszero(_cpuset)) + hwloc_bitmap_zero(nodeset); + else + /* Assume the whole system */ + hwloc_bitmap_fill(nodeset); + return; + } + + hwloc_bitmap_zero(nodeset); + obj = NULL; + while ((obj = hwloc_get_next_obj_covering_cpuset_by_depth(topology, _cpuset, depth, obj)) != NULL) + hwloc_bitmap_set(nodeset, obj->os_index); +} + +/** \brief Convert a CPU set into a NUMA node set without handling non-NUMA cases + * + * This is the strict variant of ::hwloc_cpuset_to_nodeset. It does not fix + * non-NUMA cases. If the topology contains some NUMA nodes, behave exactly + * the same. However, if the topology contains no NUMA nodes, return an empty + * nodeset. + */ +static __hwloc_inline void +hwloc_cpuset_to_nodeset_strict(struct hwloc_topology *topology, hwloc_const_cpuset_t _cpuset, hwloc_nodeset_t nodeset) +{ + int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NODE); + hwloc_obj_t obj; + if (depth == HWLOC_TYPE_DEPTH_UNKNOWN ) + return; + hwloc_bitmap_zero(nodeset); + obj = NULL; + while ((obj = hwloc_get_next_obj_covering_cpuset_by_depth(topology, _cpuset, depth, obj)) != NULL) + hwloc_bitmap_set(nodeset, obj->os_index); +} + +/** \brief Convert a NUMA node set into a CPU set and handle non-NUMA cases + * + * If the topology contains no NUMA nodes, the machine is considered + * as a single memory node, and the following behavior is used: + * If \p nodeset is empty, \p cpuset will be emptied as well. + * Otherwise \p cpuset will be entirely filled. + * This is useful for manipulating memory binding sets. + */ +static __hwloc_inline void +hwloc_cpuset_from_nodeset(hwloc_topology_t topology, hwloc_cpuset_t _cpuset, hwloc_const_nodeset_t nodeset) +{ + int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NODE); + hwloc_obj_t obj; + + if (depth == HWLOC_TYPE_DEPTH_UNKNOWN ) { + if (hwloc_bitmap_iszero(nodeset)) + hwloc_bitmap_zero(_cpuset); + else + /* Assume the whole system */ + hwloc_bitmap_fill(_cpuset); + return; + } + + hwloc_bitmap_zero(_cpuset); + obj = NULL; + while ((obj = hwloc_get_next_obj_by_depth(topology, depth, obj)) != NULL) { + if (hwloc_bitmap_isset(nodeset, obj->os_index)) + /* no need to check obj->cpuset because objects in levels always have a cpuset */ + hwloc_bitmap_or(_cpuset, _cpuset, obj->cpuset); + } +} + +/** \brief Convert a NUMA node set into a CPU set without handling non-NUMA cases + * + * This is the strict variant of ::hwloc_cpuset_from_nodeset. It does not fix + * non-NUMA cases. If the topology contains some NUMA nodes, behave exactly + * the same. However, if the topology contains no NUMA nodes, return an empty + * cpuset. + */ +static __hwloc_inline void +hwloc_cpuset_from_nodeset_strict(struct hwloc_topology *topology, hwloc_cpuset_t _cpuset, hwloc_const_nodeset_t nodeset) +{ + int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NODE); + hwloc_obj_t obj; + if (depth == HWLOC_TYPE_DEPTH_UNKNOWN ) + return; + hwloc_bitmap_zero(_cpuset); + obj = NULL; + while ((obj = hwloc_get_next_obj_by_depth(topology, depth, obj)) != NULL) + if (hwloc_bitmap_isset(nodeset, obj->os_index)) + /* no need to check obj->cpuset because objects in levels always have a cpuset */ + hwloc_bitmap_or(_cpuset, _cpuset, obj->cpuset); +} + +/** @} */ + + + +/** \defgroup hwlocality_distances Manipulating Distances + * @{ + */ + +/** \brief Get the distances between all objects at the given depth. + * + * \return a distances structure containing a matrix with all distances + * between all objects at the given depth. + * + * Slot i+nbobjs*j contains the distance from the object of logical index i + * the object of logical index j. + * + * \note This function only returns matrices covering the whole topology, + * without any unknown distance value. Those matrices are available in + * top-level object of the hierarchy. Matrices of lower objects are not + * reported here since they cover only part of the machine. + * + * The returned structure belongs to the hwloc library. The caller should + * not modify or free it. + * + * \return \c NULL if no such distance matrix exists. + */ + +static __hwloc_inline const struct hwloc_distances_s * +hwloc_get_whole_distance_matrix_by_depth(hwloc_topology_t topology, unsigned depth) +{ + hwloc_obj_t root = hwloc_get_root_obj(topology); + unsigned i; + for(i=0; idistances_count; i++) + if (root->distances[i]->relative_depth == depth) + return root->distances[i]; + return NULL; +} + +/** \brief Get the distances between all objects of a given type. + * + * \return a distances structure containing a matrix with all distances + * between all objects of the given type. + * + * Slot i+nbobjs*j contains the distance from the object of logical index i + * the object of logical index j. + * + * \note This function only returns matrices covering the whole topology, + * without any unknown distance value. Those matrices are available in + * top-level object of the hierarchy. Matrices of lower objects are not + * reported here since they cover only part of the machine. + * + * The returned structure belongs to the hwloc library. The caller should + * not modify or free it. + * + * \return \c NULL if no such distance matrix exists. + */ + +static __hwloc_inline const struct hwloc_distances_s * +hwloc_get_whole_distance_matrix_by_type(hwloc_topology_t topology, hwloc_obj_type_t type) +{ + int depth = hwloc_get_type_depth(topology, type); + if (depth < 0) + return NULL; + return hwloc_get_whole_distance_matrix_by_depth(topology, depth); +} + +/** \brief Get distances for the given depth and covering some objects + * + * Return a distance matrix that describes depth \p depth and covers at + * least object \p obj and all its children. + * + * When looking for the distance between some objects, a common ancestor should + * be passed in \p obj. + * + * \p firstp is set to logical index of the first object described by the matrix. + * + * The returned structure belongs to the hwloc library. The caller should + * not modify or free it. + */ +static __hwloc_inline const struct hwloc_distances_s * +hwloc_get_distance_matrix_covering_obj_by_depth(hwloc_topology_t topology, + hwloc_obj_t obj, unsigned depth, + unsigned *firstp) +{ + while (obj && obj->cpuset) { + unsigned i; + for(i=0; idistances_count; i++) + if (obj->distances[i]->relative_depth == depth - obj->depth) { + if (!obj->distances[i]->nbobjs) + continue; + *firstp = hwloc_get_next_obj_inside_cpuset_by_depth(topology, obj->cpuset, depth, NULL)->logical_index; + return obj->distances[i]; + } + obj = obj->parent; + } + return NULL; +} + +/** \brief Get the latency in both directions between two objects. + * + * Look at ancestor objects from the bottom to the top until one of them + * contains a distance matrix that matches the objects exactly. + * + * \p latency gets the value from object \p obj1 to \p obj2, while + * \p reverse_latency gets the reverse-direction value, which + * may be different on some architectures. + * + * \return -1 if no ancestor contains a matching latency matrix. + */ +static __hwloc_inline int +hwloc_get_latency(hwloc_topology_t topology, + hwloc_obj_t obj1, hwloc_obj_t obj2, + float *latency, float *reverse_latency) +{ + hwloc_obj_t ancestor; + const struct hwloc_distances_s * distances; + unsigned first_logical ; + + if (obj1->depth != obj2->depth) { + errno = EINVAL; + return -1; + } + + ancestor = hwloc_get_common_ancestor_obj(topology, obj1, obj2); + distances = hwloc_get_distance_matrix_covering_obj_by_depth(topology, ancestor, obj1->depth, &first_logical); + if (distances && distances->latency) { + const float * latency_matrix = distances->latency; + unsigned nbobjs = distances->nbobjs; + unsigned l1 = obj1->logical_index - first_logical; + unsigned l2 = obj2->logical_index - first_logical; + *latency = latency_matrix[l1*nbobjs+l2]; + *reverse_latency = latency_matrix[l2*nbobjs+l1]; + return 0; + } + + errno = ENOSYS; + return -1; +} + +/** @} */ + + + +/** \defgroup hwlocality_advanced_io Finding I/O objects + * @{ + */ + +/** \brief Get the first non-I/O ancestor object. + * + * Given the I/O object \p ioobj, find the smallest non-I/O ancestor + * object. This regular object may then be used for binding because + * its locality is the same as \p ioobj. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_non_io_ancestor_obj(hwloc_topology_t topology __hwloc_attribute_unused, + hwloc_obj_t ioobj) +{ + hwloc_obj_t obj = ioobj; + while (obj && !obj->cpuset) { + obj = obj->parent; + } + return obj; +} + +/** \brief Get the next PCI device in the system. + * + * \return the first PCI device if \p prev is \c NULL. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_next_pcidev(hwloc_topology_t topology, hwloc_obj_t prev) +{ + return hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_PCI_DEVICE, prev); +} + +/** \brief Find the PCI device object matching the PCI bus id + * given domain, bus device and function PCI bus id. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_pcidev_by_busid(hwloc_topology_t topology, + unsigned domain, unsigned bus, unsigned dev, unsigned func) +{ + hwloc_obj_t obj = NULL; + while ((obj = hwloc_get_next_pcidev(topology, obj)) != NULL) { + if (obj->attr->pcidev.domain == domain + && obj->attr->pcidev.bus == bus + && obj->attr->pcidev.dev == dev + && obj->attr->pcidev.func == func) + return obj; + } + return NULL; +} + +/** \brief Find the PCI device object matching the PCI bus id + * given as a string xxxx:yy:zz.t or yy:zz.t. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_pcidev_by_busidstring(hwloc_topology_t topology, const char *busid) +{ + unsigned domain = 0; /* default */ + unsigned bus, dev, func; + + if (sscanf(busid, "%x:%x.%x", &bus, &dev, &func) != 3 + && sscanf(busid, "%x:%x:%x.%x", &domain, &bus, &dev, &func) != 4) { + errno = EINVAL; + return NULL; + } + + return hwloc_get_pcidev_by_busid(topology, domain, bus, dev, func); +} + +/** \brief Get the next OS device in the system. + * + * \return the first OS device if \p prev is \c NULL. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_next_osdev(hwloc_topology_t topology, hwloc_obj_t prev) +{ + return hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_OS_DEVICE, prev); +} + +/** \brief Get the next bridge in the system. + * + * \return the first bridge if \p prev is \c NULL. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_next_bridge(hwloc_topology_t topology, hwloc_obj_t prev) +{ + return hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_BRIDGE, prev); +} + +/* \brief Checks whether a given bridge covers a given PCI bus. + */ +static __hwloc_inline int +hwloc_bridge_covers_pcibus(hwloc_obj_t bridge, + unsigned domain, unsigned bus) +{ + return bridge->type == HWLOC_OBJ_BRIDGE + && bridge->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI + && bridge->attr->bridge.downstream.pci.domain == domain + && bridge->attr->bridge.downstream.pci.secondary_bus <= bus + && bridge->attr->bridge.downstream.pci.subordinate_bus >= bus; +} + +/** \brief Find the hostbridge that covers the given PCI bus. + * + * This is useful for finding the locality of a bus because + * it is the hostbridge parent cpuset. + */ +static __hwloc_inline hwloc_obj_t +hwloc_get_hostbridge_by_pcibus(hwloc_topology_t topology, + unsigned domain, unsigned bus) +{ + hwloc_obj_t obj = NULL; + while ((obj = hwloc_get_next_bridge(topology, obj)) != NULL) { + if (hwloc_bridge_covers_pcibus(obj, domain, bus)) { + /* found bridge covering this pcibus, make sure it's a hostbridge */ + assert(obj->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_HOST); + assert(obj->parent->type != HWLOC_OBJ_BRIDGE); + assert(obj->parent->cpuset); + return obj; + } + } + return NULL; +} + +/** @} */ + + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +#endif /* HWLOC_HELPER_H */ diff --git a/ext/hwloc/include/hwloc/inlines.h b/ext/hwloc/include/hwloc/inlines.h new file mode 100644 index 000000000..34d845c10 --- /dev/null +++ b/ext/hwloc/include/hwloc/inlines.h @@ -0,0 +1,154 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2013 Inria. All rights reserved. + * Copyright © 2009-2012 Université Bordeaux 1 + * Copyright © 2009-2010 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +/** + * This file contains the inline code of functions declared in hwloc.h + */ + +#ifndef HWLOC_INLINES_H +#define HWLOC_INLINES_H + +#ifndef HWLOC_H +#error Please include the main hwloc.h instead +#endif + +#include +#include + + +#ifdef __cplusplus +extern "C" { +#endif + +static __hwloc_inline int +hwloc_get_type_or_below_depth (hwloc_topology_t topology, hwloc_obj_type_t type) +{ + int depth = hwloc_get_type_depth(topology, type); + + if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) + return depth; + + /* find the highest existing level with type order >= */ + for(depth = hwloc_get_type_depth(topology, HWLOC_OBJ_PU); ; depth--) + if (hwloc_compare_types(hwloc_get_depth_type(topology, depth), type) < 0) + return depth+1; + + /* Shouldn't ever happen, as there is always a SYSTEM level with lower order and known depth. */ + /* abort(); */ +} + +static __hwloc_inline int +hwloc_get_type_or_above_depth (hwloc_topology_t topology, hwloc_obj_type_t type) +{ + int depth = hwloc_get_type_depth(topology, type); + + if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) + return depth; + + /* find the lowest existing level with type order <= */ + for(depth = 0; ; depth++) + if (hwloc_compare_types(hwloc_get_depth_type(topology, depth), type) > 0) + return depth-1; + + /* Shouldn't ever happen, as there is always a PU level with higher order and known depth. */ + /* abort(); */ +} + +static __hwloc_inline int +hwloc_get_nbobjs_by_type (hwloc_topology_t topology, hwloc_obj_type_t type) +{ + int depth = hwloc_get_type_depth(topology, type); + if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) + return 0; + if (depth == HWLOC_TYPE_DEPTH_MULTIPLE) + return -1; /* FIXME: agregate nbobjs from different levels? */ + return hwloc_get_nbobjs_by_depth(topology, depth); +} + +static __hwloc_inline hwloc_obj_t +hwloc_get_obj_by_type (hwloc_topology_t topology, hwloc_obj_type_t type, unsigned idx) +{ + int depth = hwloc_get_type_depth(topology, type); + if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) + return NULL; + if (depth == HWLOC_TYPE_DEPTH_MULTIPLE) + return NULL; + return hwloc_get_obj_by_depth(topology, depth, idx); +} + +static __hwloc_inline hwloc_obj_t +hwloc_get_next_obj_by_depth (hwloc_topology_t topology, unsigned depth, hwloc_obj_t prev) +{ + if (!prev) + return hwloc_get_obj_by_depth (topology, depth, 0); + if (prev->depth != depth) + return NULL; + return prev->next_cousin; +} + +static __hwloc_inline hwloc_obj_t +hwloc_get_next_obj_by_type (hwloc_topology_t topology, hwloc_obj_type_t type, + hwloc_obj_t prev) +{ + int depth = hwloc_get_type_depth(topology, type); + if (depth == HWLOC_TYPE_DEPTH_UNKNOWN || depth == HWLOC_TYPE_DEPTH_MULTIPLE) + return NULL; + return hwloc_get_next_obj_by_depth (topology, depth, prev); +} + +static __hwloc_inline hwloc_obj_t +hwloc_get_root_obj (hwloc_topology_t topology) +{ + return hwloc_get_obj_by_depth (topology, 0, 0); +} + +static __hwloc_inline const char * +hwloc_obj_get_info_by_name(hwloc_obj_t obj, const char *name) +{ + unsigned i; + for(i=0; iinfos_count; i++) + if (!strcmp(obj->infos[i].name, name)) + return obj->infos[i].value; + return NULL; +} + +static __hwloc_inline void * +hwloc_alloc_membind_policy_nodeset(hwloc_topology_t topology, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) +{ + void *p = hwloc_alloc_membind_nodeset(topology, len, nodeset, policy, flags); + if (p) + return p; + hwloc_set_membind_nodeset(topology, nodeset, policy, flags); + p = hwloc_alloc(topology, len); + if (p && policy != HWLOC_MEMBIND_FIRSTTOUCH) + /* Enforce the binding by touching the data */ + memset(p, 0, len); + return p; +} + +static __hwloc_inline void * +hwloc_alloc_membind_policy(hwloc_topology_t topology, size_t len, hwloc_const_cpuset_t set, hwloc_membind_policy_t policy, int flags) +{ + void *p = hwloc_alloc_membind(topology, len, set, policy, flags); + if (p) + return p; + hwloc_set_membind(topology, set, policy, flags); + p = hwloc_alloc(topology, len); + if (p && policy != HWLOC_MEMBIND_FIRSTTOUCH) + /* Enforce the binding by touching the data */ + memset(p, 0, len); + return p; +} + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +#endif /* HWLOC_INLINES_H */ diff --git a/ext/hwloc/include/hwloc/intel-mic.h b/ext/hwloc/include/hwloc/intel-mic.h new file mode 100644 index 000000000..d58237b3d --- /dev/null +++ b/ext/hwloc/include/hwloc/intel-mic.h @@ -0,0 +1,143 @@ +/* + * Copyright © 2013 Inria. All rights reserved. + * See COPYING in top-level directory. + */ + +/** \file + * \brief Macros to help interaction between hwloc and Intel Xeon Phi (MIC). + * + * Applications that use both hwloc and Intel Xeon Phi (MIC) may want to + * include this file so as to get topology information for MIC devices. + */ + +#ifndef HWLOC_INTEL_MIC_H +#define HWLOC_INTEL_MIC_H + +#include +#include +#include +#ifdef HWLOC_LINUX_SYS +#include +#include +#include +#endif + +#include +#include + + +#ifdef __cplusplus +extern "C" { +#endif + + +/** \defgroup hwlocality_intel_mic Interoperability with Intel Xeon Phi (MIC) + * + * This interface offers ways to retrieve topology information about + * Intel Xeon Phi (MIC) devices. + * + * @{ + */ + +/** \brief Get the CPU set of logical processors that are physically + * close to MIC device whose index is \p idx. + * + * Return the CPU set describing the locality of the MIC device whose index is \p idx. + * + * Topology \p topology and device index \p idx must match the local machine. + * I/O devices detection is not needed in the topology. + * + * The function only returns the locality of the device. + * If more information about the device is needed, OS objects should + * be used instead, see hwloc_intel_mic_get_device_osdev_by_index(). + * + * This function is currently only implemented in a meaningful way for + * Linux; other systems will simply get a full cpuset. + */ +static __hwloc_inline int +hwloc_intel_mic_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused, + int idx __hwloc_attribute_unused, + hwloc_cpuset_t set) +{ +#ifdef HWLOC_LINUX_SYS + /* If we're on Linux, use the sysfs mechanism to get the local cpus */ +#define HWLOC_INTEL_MIC_DEVICE_SYSFS_PATH_MAX 128 + char path[HWLOC_INTEL_MIC_DEVICE_SYSFS_PATH_MAX]; + DIR *sysdir = NULL; + FILE *sysfile = NULL; + struct dirent *dirent; + unsigned pcibus, pcidev, pcifunc; + + if (!hwloc_topology_is_thissystem(topology)) { + errno = EINVAL; + return -1; + } + + sprintf(path, "/sys/class/mic/mic%d", idx); + sysdir = opendir(path); + if (!sysdir) + return -1; + + while ((dirent = readdir(sysdir)) != NULL) { + if (sscanf(dirent->d_name, "pci_%02x:%02x.%02x", &pcibus, &pcidev, &pcifunc) == 3) { + sprintf(path, "/sys/class/mic/mic%d/pci_%02x:%02x.%02x/local_cpus", idx, pcibus, pcidev, pcifunc); + sysfile = fopen(path, "r"); + if (!sysfile) { + closedir(sysdir); + return -1; + } + + hwloc_linux_parse_cpumap_file(sysfile, set); + if (hwloc_bitmap_iszero(set)) + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); + + fclose(sysfile); + break; + } + } + + closedir(sysdir); +#else + /* Non-Linux systems simply get a full cpuset */ + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); +#endif + return 0; +} + +/** \brief Get the hwloc OS device object corresponding to the + * MIC device for the given index. + * + * Return the OS device object describing the MIC device whose index is \p idx. + * Return NULL if there is none. + * + * The topology \p topology does not necessarily have to match the current + * machine. For instance the topology may be an XML import of a remote host. + * I/O devices detection must be enabled in the topology. + * + * \note The corresponding PCI device object can be obtained by looking + * at the OS device parent object. + */ +static __hwloc_inline hwloc_obj_t +hwloc_intel_mic_get_device_osdev_by_index(hwloc_topology_t topology, + unsigned idx) +{ + hwloc_obj_t osdev = NULL; + while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { + if (HWLOC_OBJ_OSDEV_COPROC == osdev->attr->osdev.type + && osdev->name + && !strncmp("mic", osdev->name, 3) + && atoi(osdev->name + 3) == (int) idx) + return osdev; + } + return NULL; +} + +/** @} */ + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +#endif /* HWLOC_INTEL_MIC_H */ diff --git a/ext/hwloc/include/hwloc/linux-libnuma.h b/ext/hwloc/include/hwloc/linux-libnuma.h new file mode 100644 index 000000000..f74950437 --- /dev/null +++ b/ext/hwloc/include/hwloc/linux-libnuma.h @@ -0,0 +1,355 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2013 Inria. All rights reserved. + * Copyright © 2009-2010, 2012 Université Bordeaux 1 + * See COPYING in top-level directory. + */ + +/** \file + * \brief Macros to help interaction between hwloc and Linux libnuma. + * + * Applications that use both Linux libnuma and hwloc may want to + * include this file so as to ease conversion between their respective types. +*/ + +#ifndef HWLOC_LINUX_LIBNUMA_H +#define HWLOC_LINUX_LIBNUMA_H + +#include +#include + + +#ifdef __cplusplus +extern "C" { +#endif + + +/** \defgroup hwlocality_linux_libnuma_ulongs Interoperability with Linux libnuma unsigned long masks + * + * This interface helps converting between Linux libnuma unsigned long masks + * and hwloc cpusets and nodesets. + * + * It also offers a consistent behavior on non-NUMA machines + * or non-NUMA-aware kernels by assuming that the machines have a single + * NUMA node. + * + * \note Topology \p topology must match the current machine. + * + * \note The behavior of libnuma is undefined if the kernel is not NUMA-aware. + * (when CONFIG_NUMA is not set in the kernel configuration). + * This helper and libnuma may thus not be strictly compatible in this case, + * which may be detected by checking whether numa_available() returns -1. + * + * @{ + */ + + +/** \brief Convert hwloc CPU set \p cpuset into the array of unsigned long \p mask + * + * \p mask is the array of unsigned long that will be filled. + * \p maxnode contains the maximal node number that may be stored in \p mask. + * \p maxnode will be set to the maximal node number that was found, plus one. + * + * This function may be used before calling set_mempolicy, mbind, migrate_pages + * or any other function that takes an array of unsigned long and a maximal + * node number as input parameter. + */ +static __hwloc_inline int +hwloc_cpuset_to_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_const_cpuset_t cpuset, + unsigned long *mask, unsigned long *maxnode) +{ + int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NODE); + unsigned long outmaxnode = -1; + + /* round-up to the next ulong and clear all bytes */ + *maxnode = (*maxnode + 8*sizeof(*mask) - 1) & ~(8*sizeof(*mask) - 1); + memset(mask, 0, *maxnode/8); + + if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) { + hwloc_obj_t node = NULL; + while ((node = hwloc_get_next_obj_covering_cpuset_by_depth(topology, cpuset, depth, node)) != NULL) { + if (node->os_index >= *maxnode) + continue; + mask[node->os_index/sizeof(*mask)/8] |= 1UL << (node->os_index % (sizeof(*mask)*8)); + if (outmaxnode == (unsigned long) -1 || outmaxnode < node->os_index) + outmaxnode = node->os_index; + } + + } else { + /* if no numa, libnuma assumes we have a single node */ + if (!hwloc_bitmap_iszero(cpuset)) { + mask[0] = 1; + outmaxnode = 0; + } + } + + *maxnode = outmaxnode+1; + return 0; +} + +/** \brief Convert hwloc NUMA node set \p nodeset into the array of unsigned long \p mask + * + * \p mask is the array of unsigned long that will be filled. + * \p maxnode contains the maximal node number that may be stored in \p mask. + * \p maxnode will be set to the maximal node number that was found, plus one. + * + * This function may be used before calling set_mempolicy, mbind, migrate_pages + * or any other function that takes an array of unsigned long and a maximal + * node number as input parameter. + */ +static __hwloc_inline int +hwloc_nodeset_to_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset, + unsigned long *mask, unsigned long *maxnode) +{ + int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NODE); + unsigned long outmaxnode = -1; + + /* round-up to the next ulong and clear all bytes */ + *maxnode = (*maxnode + 8*sizeof(*mask) - 1) & ~(8*sizeof(*mask) - 1); + memset(mask, 0, *maxnode/8); + + if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) { + hwloc_obj_t node = NULL; + while ((node = hwloc_get_next_obj_by_depth(topology, depth, node)) != NULL) { + if (node->os_index >= *maxnode) + continue; + if (!hwloc_bitmap_isset(nodeset, node->os_index)) + continue; + mask[node->os_index/sizeof(*mask)/8] |= 1UL << (node->os_index % (sizeof(*mask)*8)); + if (outmaxnode == (unsigned long) -1 || outmaxnode < node->os_index) + outmaxnode = node->os_index; + } + + } else { + /* if no numa, libnuma assumes we have a single node */ + if (!hwloc_bitmap_iszero(nodeset)) { + mask[0] = 1; + outmaxnode = 0; + } + } + + *maxnode = outmaxnode+1; + return 0; +} + +/** \brief Convert the array of unsigned long \p mask into hwloc CPU set + * + * \p mask is a array of unsigned long that will be read. + * \p maxnode contains the maximal node number that may be read in \p mask. + * + * This function may be used after calling get_mempolicy or any other function + * that takes an array of unsigned long as output parameter (and possibly + * a maximal node number as input parameter). + */ +static __hwloc_inline int +hwloc_cpuset_from_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_cpuset_t cpuset, + const unsigned long *mask, unsigned long maxnode) +{ + int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NODE); + + if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) { + hwloc_obj_t node = NULL; + hwloc_bitmap_zero(cpuset); + while ((node = hwloc_get_next_obj_by_depth(topology, depth, node)) != NULL) + if (node->os_index < maxnode + && (mask[node->os_index/sizeof(*mask)/8] & (1UL << (node->os_index % (sizeof(*mask)*8))))) + hwloc_bitmap_or(cpuset, cpuset, node->cpuset); + } else { + /* if no numa, libnuma assumes we have a single node */ + if (mask[0] & 1) + hwloc_bitmap_copy(cpuset, hwloc_topology_get_complete_cpuset(topology)); + else + hwloc_bitmap_zero(cpuset); + } + + return 0; +} + +/** \brief Convert the array of unsigned long \p mask into hwloc NUMA node set + * + * \p mask is a array of unsigned long that will be read. + * \p maxnode contains the maximal node number that may be read in \p mask. + * + * This function may be used after calling get_mempolicy or any other function + * that takes an array of unsigned long as output parameter (and possibly + * a maximal node number as input parameter). + */ +static __hwloc_inline int +hwloc_nodeset_from_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_nodeset_t nodeset, + const unsigned long *mask, unsigned long maxnode) +{ + int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NODE); + + if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) { + hwloc_obj_t node = NULL; + hwloc_bitmap_zero(nodeset); + while ((node = hwloc_get_next_obj_by_depth(topology, depth, node)) != NULL) + if (node->os_index < maxnode + && (mask[node->os_index/sizeof(*mask)/8] & (1UL << (node->os_index % (sizeof(*mask)*8))))) + hwloc_bitmap_set(nodeset, node->os_index); + } else { + /* if no numa, libnuma assumes we have a single node */ + if (mask[0] & 1) + hwloc_bitmap_fill(nodeset); + else + hwloc_bitmap_zero(nodeset); + } + + return 0; +} + +/** @} */ + + + +/** \defgroup hwlocality_linux_libnuma_bitmask Interoperability with Linux libnuma bitmask + * + * This interface helps converting between Linux libnuma bitmasks + * and hwloc cpusets and nodesets. + * + * It also offers a consistent behavior on non-NUMA machines + * or non-NUMA-aware kernels by assuming that the machines have a single + * NUMA node. + * + * \note Topology \p topology must match the current machine. + * + * \note The behavior of libnuma is undefined if the kernel is not NUMA-aware. + * (when CONFIG_NUMA is not set in the kernel configuration). + * This helper and libnuma may thus not be strictly compatible in this case, + * which may be detected by checking whether numa_available() returns -1. + * + * @{ + */ + + +/** \brief Convert hwloc CPU set \p cpuset into the returned libnuma bitmask + * + * The returned bitmask should later be freed with numa_bitmask_free. + * + * This function may be used before calling many numa_ functions + * that use a struct bitmask as an input parameter. + * + * \return newly allocated struct bitmask. + */ +static __hwloc_inline struct bitmask * +hwloc_cpuset_to_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_const_cpuset_t cpuset) __hwloc_attribute_malloc; +static __hwloc_inline struct bitmask * +hwloc_cpuset_to_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_const_cpuset_t cpuset) +{ + int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NODE); + struct bitmask *bitmask = numa_allocate_cpumask(); + if (!bitmask) + return NULL; + + if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) { + hwloc_obj_t node = NULL; + while ((node = hwloc_get_next_obj_covering_cpuset_by_depth(topology, cpuset, depth, node)) != NULL) + if (node->memory.local_memory) + numa_bitmask_setbit(bitmask, node->os_index); + } else { + /* if no numa, libnuma assumes we have a single node */ + if (!hwloc_bitmap_iszero(cpuset)) + numa_bitmask_setbit(bitmask, 0); + } + + return bitmask; +} + +/** \brief Convert hwloc NUMA node set \p nodeset into the returned libnuma bitmask + * + * The returned bitmask should later be freed with numa_bitmask_free. + * + * This function may be used before calling many numa_ functions + * that use a struct bitmask as an input parameter. + * + * \return newly allocated struct bitmask. + */ +static __hwloc_inline struct bitmask * +hwloc_nodeset_to_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset) __hwloc_attribute_malloc; +static __hwloc_inline struct bitmask * +hwloc_nodeset_to_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset) +{ + int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NODE); + struct bitmask *bitmask = numa_allocate_cpumask(); + if (!bitmask) + return NULL; + + if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) { + hwloc_obj_t node = NULL; + while ((node = hwloc_get_next_obj_by_depth(topology, depth, node)) != NULL) + if (hwloc_bitmap_isset(nodeset, node->os_index) && node->memory.local_memory) + numa_bitmask_setbit(bitmask, node->os_index); + } else { + /* if no numa, libnuma assumes we have a single node */ + if (!hwloc_bitmap_iszero(nodeset)) + numa_bitmask_setbit(bitmask, 0); + } + + return bitmask; +} + +/** \brief Convert libnuma bitmask \p bitmask into hwloc CPU set \p cpuset + * + * This function may be used after calling many numa_ functions + * that use a struct bitmask as an output parameter. + */ +static __hwloc_inline int +hwloc_cpuset_from_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_cpuset_t cpuset, + const struct bitmask *bitmask) +{ + int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NODE); + + if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) { + hwloc_obj_t node = NULL; + hwloc_bitmap_zero(cpuset); + while ((node = hwloc_get_next_obj_by_depth(topology, depth, node)) != NULL) + if (numa_bitmask_isbitset(bitmask, node->os_index)) + hwloc_bitmap_or(cpuset, cpuset, node->cpuset); + } else { + /* if no numa, libnuma assumes we have a single node */ + if (numa_bitmask_isbitset(bitmask, 0)) + hwloc_bitmap_copy(cpuset, hwloc_topology_get_complete_cpuset(topology)); + else + hwloc_bitmap_zero(cpuset); + } + + return 0; +} + +/** \brief Convert libnuma bitmask \p bitmask into hwloc NUMA node set \p nodeset + * + * This function may be used after calling many numa_ functions + * that use a struct bitmask as an output parameter. + */ +static __hwloc_inline int +hwloc_nodeset_from_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_nodeset_t nodeset, + const struct bitmask *bitmask) +{ + int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NODE); + + if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) { + hwloc_obj_t node = NULL; + hwloc_bitmap_zero(nodeset); + while ((node = hwloc_get_next_obj_by_depth(topology, depth, node)) != NULL) + if (numa_bitmask_isbitset(bitmask, node->os_index)) + hwloc_bitmap_set(nodeset, node->os_index); + } else { + /* if no numa, libnuma assumes we have a single node */ + if (numa_bitmask_isbitset(bitmask, 0)) + hwloc_bitmap_fill(nodeset); + else + hwloc_bitmap_zero(nodeset); + } + + return 0; +} + +/** @} */ + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +#endif /* HWLOC_LINUX_NUMA_H */ diff --git a/ext/hwloc/include/hwloc/linux.h b/ext/hwloc/include/hwloc/linux.h new file mode 100644 index 000000000..1df904651 --- /dev/null +++ b/ext/hwloc/include/hwloc/linux.h @@ -0,0 +1,70 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2013 Inria. All rights reserved. + * Copyright © 2009-2011 Université Bordeaux 1 + * See COPYING in top-level directory. + */ + +/** \file + * \brief Macros to help interaction between hwloc and Linux. + * + * Applications that use hwloc on Linux may want to include this file + * if using some low-level Linux features. + */ + +#ifndef HWLOC_LINUX_H +#define HWLOC_LINUX_H + +#include +#include + + +#ifdef __cplusplus +extern "C" { +#endif + + +/** \defgroup hwlocality_linux Linux-specific helpers + * + * This includes helpers for manipulating Linux kernel cpumap files, and hwloc + * equivalents of the Linux sched_setaffinity and sched_getaffinity system calls. + * + * @{ + */ + +/** \brief Convert a linux kernel cpumap file \p file into hwloc CPU set. + * + * Might be used when reading CPU set from sysfs attributes such as topology + * and caches for processors, or local_cpus for devices. + */ +HWLOC_DECLSPEC int hwloc_linux_parse_cpumap_file(FILE *file, hwloc_cpuset_t set); + +/** \brief Bind a thread \p tid on cpus given in cpuset \p set + * + * The behavior is exactly the same as the Linux sched_setaffinity system call, + * but uses a hwloc cpuset. + * + * \note This is equivalent to calling hwloc_set_proc_cpubind() with + * HWLOC_CPUBIND_THREAD as flags. + */ +HWLOC_DECLSPEC int hwloc_linux_set_tid_cpubind(hwloc_topology_t topology, pid_t tid, hwloc_const_cpuset_t set); + +/** \brief Get the current binding of thread \p tid + * + * The behavior is exactly the same as the Linux sched_getaffinity system call, + * but uses a hwloc cpuset. + * + * \note This is equivalent to calling hwloc_get_proc_cpubind() with + * HWLOC_CPUBIND_THREAD as flags. + */ +HWLOC_DECLSPEC int hwloc_linux_get_tid_cpubind(hwloc_topology_t topology, pid_t tid, hwloc_cpuset_t set); + +/** @} */ + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +#endif /* HWLOC_GLIBC_SCHED_H */ diff --git a/ext/hwloc/include/hwloc/myriexpress.h b/ext/hwloc/include/hwloc/myriexpress.h new file mode 100644 index 000000000..ac751bcfb --- /dev/null +++ b/ext/hwloc/include/hwloc/myriexpress.h @@ -0,0 +1,127 @@ +/* + * Copyright © 2010-2013 Inria. All rights reserved. + * Copyright © 2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +/** \file + * \brief Macros to help interaction between hwloc and Myrinet Express. + * + * Applications that use both hwloc and Myrinet Express verbs may want to + * include this file so as to get topology information for Myrinet hardware. + * + */ + +#ifndef HWLOC_MYRIEXPRESS_H +#define HWLOC_MYRIEXPRESS_H + +#include +#include + +#include + + +#ifdef __cplusplus +extern "C" { +#endif + + +/** \defgroup hwlocality_myriexpress Interoperability with Myrinet Express + * + * This interface offers ways to retrieve topology information about + * Myrinet Express hardware. + * + * @{ + */ + +/** \brief Get the CPU set of logical processors that are physically + * close the MX board \p id. + * + * Return the CPU set describing the locality of the Myrinet Express + * board whose index is \p id. + * + * Topology \p topology and device \p id must match the local machine. + * I/O devices detection is not needed in the topology. + * + * The function only returns the locality of the device. + * No additional information about the device is available. + */ +static __hwloc_inline int +hwloc_mx_board_get_device_cpuset(hwloc_topology_t topology, + unsigned id, hwloc_cpuset_t set) +{ + uint32_t in, out; + + if (!hwloc_topology_is_thissystem(topology)) { + errno = EINVAL; + return -1; + } + + in = id; + if (mx_get_info(NULL, MX_NUMA_NODE, &in, sizeof(in), &out, sizeof(out)) != MX_SUCCESS) { + errno = EINVAL; + return -1; + } + + if (out != (uint32_t) -1) { + hwloc_obj_t obj = NULL; + while ((obj = hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_NODE, obj)) != NULL) + if (obj->os_index == out) { + hwloc_bitmap_copy(set, obj->cpuset); + goto out; + } + } + /* fallback to the full topology cpuset */ + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); + + out: + return 0; +} + +/** \brief Get the CPU set of logical processors that are physically + * close the MX endpoint \p endpoint. + * + * Return the CPU set describing the locality of the Myrinet Express + * board that runs the MX endpoint \p endpoint. + * + * Topology \p topology and device \p id must match the local machine. + * I/O devices detection is not needed in the topology. + * + * The function only returns the locality of the endpoint. + * No additional information about the endpoint or device is available. + */ +static __hwloc_inline int +hwloc_mx_endpoint_get_device_cpuset(hwloc_topology_t topology, + mx_endpoint_t endpoint, hwloc_cpuset_t set) +{ + uint64_t nid; + uint32_t nindex, eid; + mx_endpoint_addr_t eaddr; + + if (mx_get_endpoint_addr(endpoint, &eaddr) != MX_SUCCESS) { + errno = EINVAL; + return -1; + } + + if (mx_decompose_endpoint_addr(eaddr, &nid, &eid) != MX_SUCCESS) { + errno = EINVAL; + return -1; + } + + if (mx_nic_id_to_board_number(nid, &nindex) != MX_SUCCESS) { + errno = EINVAL; + return -1; + } + + return hwloc_mx_board_get_device_cpuset(topology, nindex, set); +} + +/** @} */ + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +#endif /* HWLOC_MYRIEXPRESS_H */ diff --git a/ext/hwloc/include/hwloc/nvml.h b/ext/hwloc/include/hwloc/nvml.h new file mode 100644 index 000000000..462b33266 --- /dev/null +++ b/ext/hwloc/include/hwloc/nvml.h @@ -0,0 +1,176 @@ +/* + * Copyright © 2012-2013 Inria. All rights reserved. + * See COPYING in top-level directory. + */ + +/** \file + * \brief Macros to help interaction between hwloc and the NVIDIA Management Library. + * + * Applications that use both hwloc and the NVIDIA Management Library may want to + * include this file so as to get topology information for NVML devices. + */ + +#ifndef HWLOC_NVML_H +#define HWLOC_NVML_H + +#include +#include +#include +#ifdef HWLOC_LINUX_SYS +#include +#endif + +#include + + +#ifdef __cplusplus +extern "C" { +#endif + + +/** \defgroup hwlocality_nvml Interoperability with the NVIDIA Management Library + * + * This interface offers ways to retrieve topology information about + * devices managed by the NVIDIA Management Library (NVML). + * + * @{ + */ + +/** \brief Get the CPU set of logical processors that are physically + * close to NVML device \p device. + * + * Return the CPU set describing the locality of the NVML device \p device. + * + * Topology \p topology and device \p device must match the local machine. + * I/O devices detection and the NVML component are not needed in the topology. + * + * The function only returns the locality of the device. + * If more information about the device is needed, OS objects should + * be used instead, see hwloc_nvml_get_device_osdev() + * and hwloc_nvml_get_device_osdev_by_index(). + * + * This function is currently only implemented in a meaningful way for + * Linux; other systems will simply get a full cpuset. + */ +static __hwloc_inline int +hwloc_nvml_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused, + nvmlDevice_t device, hwloc_cpuset_t set) +{ +#ifdef HWLOC_LINUX_SYS + /* If we're on Linux, use the sysfs mechanism to get the local cpus */ +#define HWLOC_NVML_DEVICE_SYSFS_PATH_MAX 128 + char path[HWLOC_NVML_DEVICE_SYSFS_PATH_MAX]; + FILE *sysfile = NULL; + nvmlReturn_t nvres; + nvmlPciInfo_t pci; + + if (!hwloc_topology_is_thissystem(topology)) { + errno = EINVAL; + return -1; + } + + nvres = nvmlDeviceGetPciInfo(device, &pci); + if (NVML_SUCCESS != nvres) { + errno = EINVAL; + return -1; + } + + sprintf(path, "/sys/bus/pci/devices/%04x:%02x:%02x.0/local_cpus", pci.domain, pci.bus, pci.device); + sysfile = fopen(path, "r"); + if (!sysfile) + return -1; + + hwloc_linux_parse_cpumap_file(sysfile, set); + if (hwloc_bitmap_iszero(set)) + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); + + fclose(sysfile); +#else + /* Non-Linux systems simply get a full cpuset */ + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); +#endif + return 0; +} + +/** \brief Get the hwloc OS device object corresponding to the + * NVML device whose index is \p idx. + * + * Return the OS device object describing the NVML device whose + * index is \p idx. Returns NULL if there is none. + * + * The topology \p topology does not necessarily have to match the current + * machine. For instance the topology may be an XML import of a remote host. + * I/O devices detection and the NVML component must be enabled in the topology. + * + * \note The corresponding PCI device object can be obtained by looking + * at the OS device parent object. + */ +static __hwloc_inline hwloc_obj_t +hwloc_nvml_get_device_osdev_by_index(hwloc_topology_t topology, unsigned idx) +{ + hwloc_obj_t osdev = NULL; + while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { + if (HWLOC_OBJ_OSDEV_GPU == osdev->attr->osdev.type + && osdev->name + && !strncmp("nvml", osdev->name, 4) + && atoi(osdev->name + 4) == (int) idx) + return osdev; + } + return NULL; +} + +/** \brief Get the hwloc OS device object corresponding to NVML device \p device. + * + * Return the hwloc OS device object that describes the given + * NVML device \p device. Return NULL if there is none. + * + * Topology \p topology and device \p device must match the local machine. + * I/O devices detection and the NVML component must be enabled in the topology. + * If not, the locality of the object may still be found using + * hwloc_nvml_get_device_cpuset(). + * + * \note The corresponding hwloc PCI device may be found by looking + * at the result parent pointer. + */ +static __hwloc_inline hwloc_obj_t +hwloc_nvml_get_device_osdev(hwloc_topology_t topology, nvmlDevice_t device) +{ + hwloc_obj_t osdev; + nvmlReturn_t nvres; + nvmlPciInfo_t pci; + + if (!hwloc_topology_is_thissystem(topology)) { + errno = EINVAL; + return NULL; + } + + nvres = nvmlDeviceGetPciInfo(device, &pci); + if (NVML_SUCCESS != nvres) + return NULL; + + osdev = NULL; + while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { + hwloc_obj_t pcidev = osdev->parent; + if (strncmp(osdev->name, "nvml", 4)) + continue; + if (pcidev + && pcidev->type == HWLOC_OBJ_PCI_DEVICE + && pcidev->attr->pcidev.domain == pci.domain + && pcidev->attr->pcidev.bus == pci.bus + && pcidev->attr->pcidev.dev == pci.device + && pcidev->attr->pcidev.func == 0) + return osdev; + } + + return NULL; +} + +/** @} */ + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +#endif /* HWLOC_NVML_H */ diff --git a/ext/hwloc/include/hwloc/opencl.h b/ext/hwloc/include/hwloc/opencl.h new file mode 100644 index 000000000..00c97580b --- /dev/null +++ b/ext/hwloc/include/hwloc/opencl.h @@ -0,0 +1,199 @@ +/* + * Copyright © 2012-2013 Inria. All rights reserved. + * Copyright © 2013 Université Bordeaux 1. All right reserved. + * See COPYING in top-level directory. + */ + +/** \file + * \brief Macros to help interaction between hwloc and the OpenCL interface. + * + * Applications that use both hwloc and OpenCL may want to + * include this file so as to get topology information for OpenCL devices. + */ + +#ifndef HWLOC_OPENCL_H +#define HWLOC_OPENCL_H + +#include +#include +#include +#ifdef HWLOC_LINUX_SYS +#include +#endif + +#include +#include + +#include + + +#ifdef __cplusplus +extern "C" { +#endif + + +/** \defgroup hwlocality_opencl Interoperability with OpenCL + * + * This interface offers ways to retrieve topology information about + * OpenCL devices. + * + * Only the AMD OpenCL interface currently offers useful locality information + * about its devices. + * + * @{ + */ + +/** \brief Get the CPU set of logical processors that are physically + * close to OpenCL device \p device. + * + * Return the CPU set describing the locality of the OpenCL device \p device. + * + * Topology \p topology and device \p device must match the local machine. + * I/O devices detection and the OpenCL component are not needed in the topology. + * + * The function only returns the locality of the device. + * If more information about the device is needed, OS objects should + * be used instead, see hwloc_opencl_get_device_osdev() + * and hwloc_opencl_get_device_osdev_by_index(). + * + * This function is currently only implemented in a meaningful way for + * Linux with the AMD OpenCL implementation; other systems will simply + * get a full cpuset. + */ +static __hwloc_inline int +hwloc_opencl_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused, + cl_device_id device __hwloc_attribute_unused, + hwloc_cpuset_t set) +{ +#if (defined HWLOC_LINUX_SYS) && (defined CL_DEVICE_TOPOLOGY_AMD) + /* If we're on Linux + AMD OpenCL, use the AMD extension + the sysfs mechanism to get the local cpus */ +#define HWLOC_OPENCL_DEVICE_SYSFS_PATH_MAX 128 + char path[HWLOC_OPENCL_DEVICE_SYSFS_PATH_MAX]; + FILE *sysfile = NULL; + cl_device_topology_amd amdtopo; + cl_int clret; + + if (!hwloc_topology_is_thissystem(topology)) { + errno = EINVAL; + return -1; + } + + clret = clGetDeviceInfo(device, CL_DEVICE_TOPOLOGY_AMD, sizeof(amdtopo), &amdtopo, NULL); + if (CL_SUCCESS != clret) { + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); + return 0; + } + if (CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD != amdtopo.raw.type) { + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); + return 0; + } + + sprintf(path, "/sys/bus/pci/devices/0000:%02x:%02x.%01x/local_cpus", amdtopo.pcie.bus, amdtopo.pcie.device, amdtopo.pcie.function); + sysfile = fopen(path, "r"); + if (!sysfile) + return -1; + + hwloc_linux_parse_cpumap_file(sysfile, set); + if (hwloc_bitmap_iszero(set)) + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); + + fclose(sysfile); +#else + /* Non-Linux + AMD OpenCL systems simply get a full cpuset */ + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); +#endif + return 0; +} + +/** \brief Get the hwloc OS device object corresponding to the + * OpenCL device for the given indexes. + * + * Return the OS device object describing the OpenCL device + * whose platform index is \p platform_index, + * and whose device index within this platform if \p device_index. + * Return NULL if there is none. + * + * The topology \p topology does not necessarily have to match the current + * machine. For instance the topology may be an XML import of a remote host. + * I/O devices detection and the OpenCL component must be enabled in the topology. + * + * \note The corresponding PCI device object can be obtained by looking + * at the OS device parent object. + */ +static __hwloc_inline hwloc_obj_t +hwloc_opencl_get_device_osdev_by_index(hwloc_topology_t topology, + unsigned platform_index, unsigned device_index) +{ + unsigned x = (unsigned) -1, y = (unsigned) -1; + hwloc_obj_t osdev = NULL; + while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { + if (HWLOC_OBJ_OSDEV_COPROC == osdev->attr->osdev.type + && osdev->name + && sscanf(osdev->name, "opencl%ud%u", &x, &y) == 2 + && platform_index == x && device_index == y) + return osdev; + } + return NULL; +} + +/** \brief Get the hwloc OS device object corresponding to OpenCL device \p device. + * + * Return the hwloc OS device object that describes the given + * OpenCL device \p device. Return NULL if there is none. + * + * Topology \p topology and device \p device must match the local machine. + * I/O devices detection and the OpenCL component must be enabled in the topology. + * If not, the locality of the object may still be found using + * hwloc_opencl_get_device_cpuset(). + * + * \note The corresponding hwloc PCI device may be found by looking + * at the result parent pointer. + */ +static __hwloc_inline hwloc_obj_t +hwloc_opencl_get_device_osdev(hwloc_topology_t topology __hwloc_attribute_unused, + cl_device_id device __hwloc_attribute_unused) +{ +#ifdef CL_DEVICE_TOPOLOGY_AMD + hwloc_obj_t osdev; + cl_device_topology_amd amdtopo; + cl_int clret; + + clret = clGetDeviceInfo(device, CL_DEVICE_TOPOLOGY_AMD, sizeof(amdtopo), &amdtopo, NULL); + if (CL_SUCCESS != clret) { + errno = EINVAL; + return NULL; + } + if (CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD != amdtopo.raw.type) { + errno = EINVAL; + return NULL; + } + + osdev = NULL; + while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { + hwloc_obj_t pcidev = osdev->parent; + if (strncmp(osdev->name, "opencl", 6)) + continue; + if (pcidev + && pcidev->type == HWLOC_OBJ_PCI_DEVICE + && pcidev->attr->pcidev.domain == 0 + && pcidev->attr->pcidev.bus == amdtopo.pcie.bus + && pcidev->attr->pcidev.dev == amdtopo.pcie.device + && pcidev->attr->pcidev.func == amdtopo.pcie.function) + return osdev; + } + + return NULL; +#else + return NULL; +#endif +} + +/** @} */ + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +#endif /* HWLOC_OPENCL_H */ diff --git a/ext/hwloc/include/hwloc/openfabrics-verbs.h b/ext/hwloc/include/hwloc/openfabrics-verbs.h new file mode 100644 index 000000000..69f86fe1b --- /dev/null +++ b/ext/hwloc/include/hwloc/openfabrics-verbs.h @@ -0,0 +1,155 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2013 Inria. All rights reserved. + * Copyright © 2009-2010 Université Bordeaux 1 + * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +/** \file + * \brief Macros to help interaction between hwloc and OpenFabrics + * verbs. + * + * Applications that use both hwloc and OpenFabrics verbs may want to + * include this file so as to get topology information for OpenFabrics + * hardware. + * + */ + +#ifndef HWLOC_OPENFABRICS_VERBS_H +#define HWLOC_OPENFABRICS_VERBS_H + +#include +#include +#ifdef HWLOC_LINUX_SYS +#include +#endif + +#include + + +#ifdef __cplusplus +extern "C" { +#endif + + +/** \defgroup hwlocality_openfabrics Interoperability with OpenFabrics + * + * This interface offers ways to retrieve topology information about + * OpenFabrics devices. + * + * @{ + */ + +/** \brief Get the CPU set of logical processors that are physically + * close to device \p ibdev. + * + * Return the CPU set describing the locality of the OpenFabrics + * device \p ibdev. + * + * Topology \p topology and device \p ibdev must match the local machine. + * I/O devices detection is not needed in the topology. + * + * The function only returns the locality of the device. + * If more information about the device is needed, OS objects should + * be used instead, see hwloc_ibv_get_device_osdev() + * and hwloc_ibv_get_device_osdev_by_name(). + * + * This function is currently only implemented in a meaningful way for + * Linux; other systems will simply get a full cpuset. + */ +static __hwloc_inline int +hwloc_ibv_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused, + struct ibv_device *ibdev, hwloc_cpuset_t set) +{ +#ifdef HWLOC_LINUX_SYS + /* If we're on Linux, use the verbs-provided sysfs mechanism to + get the local cpus */ +#define HWLOC_OPENFABRICS_VERBS_SYSFS_PATH_MAX 128 + char path[HWLOC_OPENFABRICS_VERBS_SYSFS_PATH_MAX]; + FILE *sysfile = NULL; + + if (!hwloc_topology_is_thissystem(topology)) { + errno = EINVAL; + return -1; + } + + sprintf(path, "/sys/class/infiniband/%s/device/local_cpus", + ibv_get_device_name(ibdev)); + sysfile = fopen(path, "r"); + if (!sysfile) + return -1; + + hwloc_linux_parse_cpumap_file(sysfile, set); + if (hwloc_bitmap_iszero(set)) + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); + + fclose(sysfile); +#else + /* Non-Linux systems simply get a full cpuset */ + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); +#endif + return 0; +} + +/** \brief Get the hwloc OS device object corresponding to the OpenFabrics + * device named \p ibname. + * + * Return the OS device object describing the OpenFabrics device whose + * name is \p ibname. Returns NULL if there is none. + * The name \p ibname is usually obtained from ibv_get_device_name(). + * + * The topology \p topology does not necessarily have to match the current + * machine. For instance the topology may be an XML import of a remote host. + * I/O devices detection must be enabled in the topology. + * + * \note The corresponding PCI device object can be obtained by looking + * at the OS device parent object. + */ +static __hwloc_inline hwloc_obj_t +hwloc_ibv_get_device_osdev_by_name(hwloc_topology_t topology, + const char *ibname) +{ + hwloc_obj_t osdev = NULL; + while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { + if (HWLOC_OBJ_OSDEV_OPENFABRICS == osdev->attr->osdev.type + && osdev->name && !strcmp(ibname, osdev->name)) + return osdev; + } + return NULL; +} + +/** \brief Get the hwloc OS device object corresponding to the OpenFabrics + * device \p ibdev. + * + * Return the OS device object describing the OpenFabrics device \p ibdev. + * Returns NULL if there is none. + * + * Topology \p topology and device \p ibdev must match the local machine. + * I/O devices detection must be enabled in the topology. + * If not, the locality of the object may still be found using + * hwloc_ibv_get_device_cpuset(). + * + * \note The corresponding PCI device object can be obtained by looking + * at the OS device parent object. + */ +static __hwloc_inline hwloc_obj_t +hwloc_ibv_get_device_osdev(hwloc_topology_t topology, + struct ibv_device *ibdev) +{ + if (!hwloc_topology_is_thissystem(topology)) { + errno = EINVAL; + return NULL; + } + return hwloc_ibv_get_device_osdev_by_name(topology, ibv_get_device_name(ibdev)); +} + +/** @} */ + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +#endif /* HWLOC_OPENFABRICS_VERBS_H */ diff --git a/ext/hwloc/include/hwloc/plugins.h b/ext/hwloc/include/hwloc/plugins.h new file mode 100644 index 000000000..aa5d993c6 --- /dev/null +++ b/ext/hwloc/include/hwloc/plugins.h @@ -0,0 +1,385 @@ +/* + * Copyright © 2013 Inria. All rights reserved. + * See COPYING in top-level directory. + */ + +#ifndef HWLOC_PLUGINS_H +#define HWLOC_PLUGINS_H + +/** \file + * \brief Public interface for building hwloc plugins. + */ + +struct hwloc_backend; + +#include +#ifdef HWLOC_INSIDE_PLUGIN +/* needed for hwloc_plugin_check_namespace() */ +#include +#endif + + + +/** \defgroup hwlocality_disc_components Components and Plugins: Discovery components + * @{ + */ + +/** \brief Discovery component type */ +typedef enum hwloc_disc_component_type_e { + /** \brief CPU-only discovery through the OS, or generic no-OS support. + * \hideinitializer */ + HWLOC_DISC_COMPONENT_TYPE_CPU = (1<<0), + + /** \brief xml, synthetic or custom, + * platform-specific components such as bgq. + * Anything the discovers CPU and everything else. + * No misc backend is expected to complement a global component. + * \hideinitializer */ + HWLOC_DISC_COMPONENT_TYPE_GLOBAL = (1<<1), + + /** \brief OpenCL, Cuda, etc. + * \hideinitializer */ + HWLOC_DISC_COMPONENT_TYPE_MISC = (1<<2) +} hwloc_disc_component_type_t; + +/** \brief Discovery component structure + * + * This is the major kind of components, taking care of the discovery. + * They are registered by generic components, either statically-built or as plugins. + */ +struct hwloc_disc_component { + /** \brief Discovery component type */ + hwloc_disc_component_type_t type; + + /** \brief Name. + * If this component is built as a plugin, this name does not have to match the plugin filename. + */ + const char *name; + + /** \brief Component types to exclude, as an OR'ed set of HWLOC_DISC_COMPONENT_TYPE_*. + * + * For a GLOBAL component, this usually includes all other types (~0). + * + * Other components only exclude types that may bring conflicting + * topology information. MISC components should likely not be excluded + * since they usually bring non-primary additional information. + */ + unsigned excludes; + + /** \brief Instantiate callback to create a backend from the component. + * Parameters data1, data2, data3 are NULL except for components + * that have special enabling routines such as hwloc_topology_set_xml(). */ + struct hwloc_backend * (*instantiate)(struct hwloc_disc_component *component, const void *data1, const void *data2, const void *data3); + + /** \brief Component priority. + * Used to sort topology->components, higher priority first. + * Also used to decide between two components with the same name. + * + * Usual values are + * 50 for native OS (or platform) components, + * 45 for x86, + * 40 for no-OS fallback, + * 30 for global components (xml/synthetic/custom), + * 20 for pci, + * 10 for other misc components (opencl etc.). + */ + unsigned priority; + + /** \private Used internally to list components by priority on topology->components + * (the component structure is usually read-only, + * the core copies it before using this field for queueing) + */ + struct hwloc_disc_component * next; +}; + +/** @} */ + + + + +/** \defgroup hwlocality_disc_backends Components and Plugins: Discovery backends + * @{ + */ + +/** \brief Discovery backend structure + * + * A backend is the instantiation of a discovery component. + * When a component gets enabled for a topology, + * its instantiate() callback creates a backend. + * + * hwloc_backend_alloc() initializes all fields to default values + * that the component may change (except "component" and "next") + * before enabling the backend with hwloc_backend_enable(). + */ +struct hwloc_backend { + /** \private Reserved for the core, set by hwloc_backend_alloc() */ + struct hwloc_disc_component * component; + /** \private Reserved for the core, set by hwloc_backend_enable() */ + struct hwloc_topology * topology; + /** \private Reserved for the core. Set to 1 if forced through envvar, 0 otherwise. */ + int envvar_forced; + /** \private Reserved for the core. Used internally to list backends topology->backends. */ + struct hwloc_backend * next; + + /** \brief Backend flags, as an OR'ed set of HWLOC_BACKEND_FLAG_* */ + unsigned long flags; + + /** \brief Backend-specific 'is_custom' property. + * Shortcut on !strcmp(..->component->name, "custom"). + * Only the custom component should touch this. */ + int is_custom; + + /** \brief Backend-specific 'is_thissystem' property. + * Set to 0 or 1 if the backend should enforce the thissystem flag when it gets enabled. + * Set to -1 if the backend doesn't care (default). */ + int is_thissystem; + + /** \brief Backend private data, or NULL if none. */ + void * private_data; + /** \brief Callback for freeing the private_data. + * May be NULL. + */ + void (*disable)(struct hwloc_backend *backend); + + /** \brief Main discovery callback. + * returns > 0 if it modified the topology tree, -1 on error, 0 otherwise. + * May be NULL if type is HWLOC_DISC_COMPONENT_TYPE_MISC. */ + int (*discover)(struct hwloc_backend *backend); + + /** \brief Callback used by the PCI backend to retrieve the locality of a PCI object from the OS/cpu backend. + * May be NULL. */ + int (*get_obj_cpuset)(struct hwloc_backend *backend, struct hwloc_backend *caller, struct hwloc_obj *obj, hwloc_bitmap_t cpuset); + + /** \brief Callback called by backends to notify this backend that a new object was added. + * returns > 0 if it modified the topology tree, 0 otherwise. + * May be NULL. */ + int (*notify_new_object)(struct hwloc_backend *backend, struct hwloc_backend *caller, struct hwloc_obj *obj); +}; + +/** \brief Backend flags */ +enum hwloc_backend_flag_e { + /** \brief Levels should be reconnected before this backend discover() is used. + * \hideinitializer */ + HWLOC_BACKEND_FLAG_NEED_LEVELS = (1UL<<0) +}; + +/** \brief Allocate a backend structure, set good default values, initialize backend->component and topology, etc. + * The caller will then modify whatever needed, and call hwloc_backend_enable(). + */ +HWLOC_DECLSPEC struct hwloc_backend * hwloc_backend_alloc(struct hwloc_disc_component *component); + +/** \brief Enable a previously allocated and setup backend. */ +HWLOC_DECLSPEC int hwloc_backend_enable(struct hwloc_topology *topology, struct hwloc_backend *backend); + +/** \brief Used by backends discovery callbacks to request locality information from others. + * + * Traverse the list of enabled backends until one has a + * get_obj_cpuset() method, and call it. + */ +HWLOC_DECLSPEC int hwloc_backends_get_obj_cpuset(struct hwloc_backend *caller, struct hwloc_obj *obj, hwloc_bitmap_t cpuset); + +/** \brief Used by backends discovery callbacks to notify other + * backends of new objects. + * + * Traverse the list of enabled backends (all but caller) and invoke + * their notify_new_object() method to notify them that a new object + * just got added to the topology. + * + * Currently only used for notifying of new PCI device objects. + */ +HWLOC_DECLSPEC int hwloc_backends_notify_new_object(struct hwloc_backend *caller, struct hwloc_obj *obj); + +/** @} */ + + + + +/** \defgroup hwlocality_generic_components Components and Plugins: Generic components + * @{ + */ + +/** \brief Generic component type */ +typedef enum hwloc_component_type_e { + /** \brief The data field must point to a struct hwloc_disc_component. */ + HWLOC_COMPONENT_TYPE_DISC, + + /** \brief The data field must point to a struct hwloc_xml_component. */ + HWLOC_COMPONENT_TYPE_XML +} hwloc_component_type_t; + +/** \brief Generic component structure + * + * Generic components structure, either statically listed by configure in static-components.h + * or dynamically loaded as a plugin. + */ +struct hwloc_component { + /** \brief Component ABI version, set to HWLOC_COMPONENT_ABI */ + unsigned abi; + + /** \brief Component type */ + hwloc_component_type_t type; + + /** \brief Component flags, unused for now */ + unsigned long flags; + + /** \brief Component data, pointing to a struct hwloc_disc_component or struct hwloc_xml_component. */ + void * data; +}; + +/** @} */ + + + + +/** \defgroup hwlocality_components_core_funcs Components and Plugins: Core functions to be used by components + * @{ + */ + +/** \brief Add an object to the topology. + * + * It is sorted along the tree of other objects according to the inclusion of + * cpusets, to eventually be added as a child of the smallest object including + * this object. + * + * If the cpuset is empty, the type of the object (and maybe some attributes) + * must be enough to find where to insert the object. This is especially true + * for NUMA nodes with memory and no CPUs. + * + * The given object should not have children. + * + * This shall only be called before levels are built. + * + * In case of error, hwloc_report_os_error() is called. + * + * Returns the object on success. + * Returns NULL and frees obj on error. + * Returns another object and frees obj if it was merged with an identical pre-existing object. + */ +HWLOC_DECLSPEC struct hwloc_obj *hwloc_insert_object_by_cpuset(struct hwloc_topology *topology, hwloc_obj_t obj); + +/** \brief Type of error callbacks during object insertion */ +typedef void (*hwloc_report_error_t)(const char * msg, int line); +/** \brief Report an insertion error from a backend */ +HWLOC_DECLSPEC void hwloc_report_os_error(const char * msg, int line); +/** \brief Check whether insertion errors are hidden */ +HWLOC_DECLSPEC int hwloc_hide_errors(void); + +/** \brief Add an object to the topology and specify which error callback to use. + * + * Aside from the error callback selection, this function is identical to hwloc_insert_object_by_cpuset() + */ +HWLOC_DECLSPEC struct hwloc_obj *hwloc__insert_object_by_cpuset(struct hwloc_topology *topology, hwloc_obj_t obj, hwloc_report_error_t report_error); + +/** \brief Insert an object somewhere in the topology. + * + * It is added as the last child of the given parent. + * The cpuset is completely ignored, so strange objects such as I/O devices should + * preferably be inserted with this. + * + * The given object may have children. + * + * Remember to call topology_connect() afterwards to fix handy pointers. + */ +HWLOC_DECLSPEC void hwloc_insert_object_by_parent(struct hwloc_topology *topology, hwloc_obj_t parent, hwloc_obj_t obj); + +/** \brief Allocate and initialize an object of the given type and physical index */ +static __hwloc_inline struct hwloc_obj * +hwloc_alloc_setup_object(hwloc_obj_type_t type, signed os_index) +{ + struct hwloc_obj *obj = malloc(sizeof(*obj)); + memset(obj, 0, sizeof(*obj)); + obj->type = type; + obj->os_index = os_index; + obj->os_level = -1; + obj->attr = malloc(sizeof(*obj->attr)); + memset(obj->attr, 0, sizeof(*obj->attr)); + /* do not allocate the cpuset here, let the caller do it */ + return obj; +} + +/** \brief Setup object cpusets/nodesets by OR'ing its children. + * + * Used when adding an object late in the topology, after propagating sets up and down. + * The caller should use this after inserting by cpuset (which means the cpusets is already OK). + * Typical case: PCI backend adding a hostbridge parent. + */ +HWLOC_DECLSPEC int hwloc_fill_object_sets(hwloc_obj_t obj); + +/** \brief Insert a list of PCI devices and bridges in the backend topology. + * + * Insert a list of objects (either PCI device or bridges) starting at first_obj + * (linked by next_sibling in the topology, and ending with NULL). + * Objects are placed under the right bridges, and the remaining upstream bridges + * are then inserted in the topology by calling the get_obj_cpuset() callback to + * find their locality. + */ +HWLOC_DECLSPEC int hwloc_insert_pci_device_list(struct hwloc_backend *backend, struct hwloc_obj *first_obj); + +/** \brief Return the offset of the given capability in the PCI config space buffer + * + * This function requires a 256-bytes config space. Unknown/unavailable bytes should be set to 0xff. + */ +HWLOC_DECLSPEC unsigned hwloc_pci_find_cap(const unsigned char *config, unsigned cap); + +/** \brief Fill linkspeed by reading the PCI config space where PCI_CAP_ID_EXP is at position offset. + * + * Needs 20 bytes of EXP capability block starting at offset in the config space + * for registers up to link status. + */ +HWLOC_DECLSPEC int hwloc_pci_find_linkspeed(const unsigned char *config, unsigned offset, float *linkspeed); + +/** \brief Modify the PCI device object into a bridge and fill its attribute if a bridge is found in the PCI config space. + * + * This function requires 64 bytes of common configuration header at the beginning of config. + */ +HWLOC_DECLSPEC int hwloc_pci_prepare_bridge(hwloc_obj_t obj, const unsigned char *config); + +/** \brief Make sure that plugins can lookup core symbols. + * + * This is a sanity check to avoid lazy-lookup failures when libhwloc + * is loaded within a plugin, and later tries to load its own plugins. + * This may fail (and abort the program) if libhwloc symbols are in a + * private namespace. + * + * Plugins should call this function as an early sanity check to avoid + * later crashes if lazy symbol resolution is used by the upper layer that + * loaded hwloc (e.g. OpenCL implementations using dlopen with RTLD_LAZY). + * + * \note The build system must define HWLOC_INSIDE_PLUGIN if and only if + * building the caller as a plugin. + */ +static __hwloc_inline int +hwloc_plugin_check_namespace(const char *pluginname __hwloc_attribute_unused, const char *symbol __hwloc_attribute_unused) +{ +#ifdef HWLOC_INSIDE_PLUGIN + lt_dlhandle handle; + void *sym; + handle = lt_dlopen(NULL); + if (!handle) + /* cannot check, assume things will work */ + return 0; + sym = lt_dlsym(handle, symbol); + lt_dlclose(handle); + if (!sym) { + static int verboseenv_checked = 0; + static int verboseenv_value = 0; + if (!verboseenv_checked) { + char *verboseenv = getenv("HWLOC_PLUGINS_VERBOSE"); + verboseenv_value = atoi(verboseenv); + verboseenv_checked = 1; + } + if (verboseenv_value) + fprintf(stderr, "Plugin `%s' disabling itself because it cannot find the `%s' core symbol.\n", + pluginname, symbol); + return -1; + } +#endif /* HWLOC_INSIDE_PLUGIN */ + return 0; +} + +/** @} */ + + + + +#endif /* HWLOC_PLUGINS_H */ diff --git a/ext/hwloc/include/hwloc/rename.h b/ext/hwloc/include/hwloc/rename.h new file mode 100644 index 000000000..ab0bf389c --- /dev/null +++ b/ext/hwloc/include/hwloc/rename.h @@ -0,0 +1,625 @@ +/* + * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. + * Copyright © 2010-2013 Inria. All rights reserved. + * See COPYING in top-level directory. + */ + +#ifndef HWLOC_RENAME_H +#define HWLOC_RENAME_H + +#include + + +#ifdef __cplusplus +extern "C" { +#endif + + +/* Only enact these defines if we're actually renaming the symbols + (i.e., avoid trying to have no-op defines if we're *not* + renaming). */ + +#if HWLOC_SYM_TRANSFORM + +/* Use a preprocessor two-step in order to get the prefixing right. + Make 2 macros: HWLOC_NAME and HWLOC_NAME_CAPS for renaming + things. */ + +#define HWLOC_MUNGE_NAME(a, b) HWLOC_MUNGE_NAME2(a, b) +#define HWLOC_MUNGE_NAME2(a, b) a ## b +#define HWLOC_NAME(name) HWLOC_MUNGE_NAME(HWLOC_SYM_PREFIX, hwloc_ ## name) +#define HWLOC_NAME_CAPS(name) HWLOC_MUNGE_NAME(HWLOC_SYM_PREFIX_CAPS, hwloc_ ## name) + +/* Now define all the "real" names to be the prefixed names. This + allows us to use the real names throughout the code base (i.e., + "hwloc_"); the preprocessor will adjust to have the prefixed + name under the covers. */ + +/* Names from hwloc.h */ + +#define hwloc_get_api_version HWLOC_NAME(get_api_version) + +#define hwloc_topology HWLOC_NAME(topology) +#define hwloc_topology_t HWLOC_NAME(topology_t) + +#define hwloc_cpuset_t HWLOC_NAME(cpuset_t) +#define hwloc_const_cpuset_t HWLOC_NAME(const_cpuset_t) +#define hwloc_nodeset_t HWLOC_NAME(nodeset_t) +#define hwloc_const_nodeset_t HWLOC_NAME(const_nodeset_t) + +#define HWLOC_OBJ_SYSTEM HWLOC_NAME_CAPS(OBJ_SYSTEM) +#define HWLOC_OBJ_MACHINE HWLOC_NAME_CAPS(OBJ_MACHINE) +#define HWLOC_OBJ_NODE HWLOC_NAME_CAPS(OBJ_NODE) +#define HWLOC_OBJ_SOCKET HWLOC_NAME_CAPS(OBJ_SOCKET) +#define HWLOC_OBJ_CACHE HWLOC_NAME_CAPS(OBJ_CACHE) +#define HWLOC_OBJ_CORE HWLOC_NAME_CAPS(OBJ_CORE) +#define HWLOC_OBJ_PU HWLOC_NAME_CAPS(OBJ_PU) +#define HWLOC_OBJ_MISC HWLOC_NAME_CAPS(OBJ_MISC) +#define HWLOC_OBJ_GROUP HWLOC_NAME_CAPS(OBJ_GROUP) +#define HWLOC_OBJ_BRIDGE HWLOC_NAME_CAPS(OBJ_BRIDGE) +#define HWLOC_OBJ_PCI_DEVICE HWLOC_NAME_CAPS(OBJ_PCI_DEVICE) +#define HWLOC_OBJ_OS_DEVICE HWLOC_NAME_CAPS(OBJ_OS_DEVICE) +#define HWLOC_OBJ_TYPE_MAX HWLOC_NAME_CAPS(OBJ_TYPE_MAX) +#define hwloc_obj_type_t HWLOC_NAME(obj_type_t) + +#define hwloc_obj_cache_type_e HWLOC_NAME(obj_cache_type_e) +#define hwloc_obj_cache_type_t HWLOC_NAME(obj_cache_type_t) +#define HWLOC_OBJ_CACHE_UNIFIED HWLOC_NAME_CAPS(OBJ_CACHE_UNIFIED) +#define HWLOC_OBJ_CACHE_DATA HWLOC_NAME_CAPS(OBJ_CACHE_DATA) +#define HWLOC_OBJ_CACHE_INSTRUCTION HWLOC_NAME_CAPS(OBJ_CACHE_INSTRUCTION) + +#define hwloc_obj_bridge_type_e HWLOC_NAME(obj_bridge_type_e) +#define hwloc_obj_bridge_type_t HWLOC_NAME(obj_bridge_type_t) +#define HWLOC_OBJ_BRIDGE_HOST HWLOC_NAME_CAPS(OBJ_BRIDGE_HOST) +#define HWLOC_OBJ_BRIDGE_PCI HWLOC_NAME_CAPS(OBJ_BRIDGE_PCI) + +#define hwloc_obj_osdev_type_e HWLOC_NAME(obj_osdev_type_e) +#define hwloc_obj_osdev_type_t HWLOC_NAME(obj_osdev_type_t) +#define HWLOC_OBJ_OSDEV_BLOCK HWLOC_NAME_CAPS(OBJ_OSDEV_BLOCK) +#define HWLOC_OBJ_OSDEV_GPU HWLOC_NAME_CAPS(OBJ_OSDEV_GPU) +#define HWLOC_OBJ_OSDEV_NETWORK HWLOC_NAME_CAPS(OBJ_OSDEV_NETWORK) +#define HWLOC_OBJ_OSDEV_OPENFABRICS HWLOC_NAME_CAPS(OBJ_OSDEV_OPENFABRICS) +#define HWLOC_OBJ_OSDEV_DMA HWLOC_NAME_CAPS(OBJ_OSDEV_DMA) +#define HWLOC_OBJ_OSDEV_COPROC HWLOC_NAME_CAPS(OBJ_OSDEV_COPROC) + +#define hwloc_compare_types HWLOC_NAME(compare_types) + +#define hwloc_compare_types_e HWLOC_NAME(compare_types_e) +#define HWLOC_TYPE_UNORDERED HWLOC_NAME_CAPS(TYPE_UNORDERED) + +#define hwloc_obj_memory_s HWLOC_NAME(obj_memory_s) +#define hwloc_obj_memory_page_type_s HWLOC_NAME(obj_memory_page_type_s) + +#define hwloc_obj HWLOC_NAME(obj) +#define hwloc_obj_t HWLOC_NAME(obj_t) + +#define hwloc_distances_s HWLOC_NAME(distances_s) +#define hwloc_obj_info_s HWLOC_NAME(obj_info_s) + +#define hwloc_obj_attr_u HWLOC_NAME(obj_attr_u) +#define hwloc_cache_attr_s HWLOC_NAME(cache_attr_s) +#define hwloc_group_attr_s HWLOC_NAME(group_attr_s) +#define hwloc_pcidev_attr_s HWLOC_NAME(pcidev_attr_s) +#define hwloc_bridge_attr_s HWLOC_NAME(bridge_attr_s) +#define hwloc_osdev_attr_s HWLOC_NAME(osdev_attr_s) + +#define hwloc_topology_init HWLOC_NAME(topology_init) +#define hwloc_topology_load HWLOC_NAME(topology_load) +#define hwloc_topology_destroy HWLOC_NAME(topology_destroy) +#define hwloc_topology_check HWLOC_NAME(topology_check) +#define hwloc_topology_ignore_type HWLOC_NAME(topology_ignore_type) +#define hwloc_topology_ignore_type_keep_structure HWLOC_NAME(topology_ignore_type_keep_structure) +#define hwloc_topology_ignore_all_keep_structure HWLOC_NAME(topology_ignore_all_keep_structure) + +#define hwloc_topology_flags_e HWLOC_NAME(topology_flags_e) + +#define HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM HWLOC_NAME_CAPS(TOPOLOGY_FLAG_WHOLE_SYSTEM) +#define HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM HWLOC_NAME_CAPS(TOPOLOGY_FLAG_IS_THISSYSTEM) +#define HWLOC_TOPOLOGY_FLAG_IO_DEVICES HWLOC_NAME_CAPS(TOPOLOGY_FLAG_IO_DEVICES) +#define HWLOC_TOPOLOGY_FLAG_IO_BRIDGES HWLOC_NAME_CAPS(TOPOLOGY_FLAG_IO_BRIDGES) +#define HWLOC_TOPOLOGY_FLAG_WHOLE_IO HWLOC_NAME_CAPS(TOPOLOGY_FLAG_WHOLE_IO) +#define HWLOC_TOPOLOGY_FLAG_ICACHES HWLOC_NAME_CAPS(TOPOLOGY_FLAG_ICACHES) + +#define hwloc_topology_set_flags HWLOC_NAME(topology_set_flags) +#define hwloc_topology_set_fsroot HWLOC_NAME(topology_set_fsroot) +#define hwloc_topology_set_pid HWLOC_NAME(topology_set_pid) +#define hwloc_topology_set_synthetic HWLOC_NAME(topology_set_synthetic) +#define hwloc_topology_set_xml HWLOC_NAME(topology_set_xml) +#define hwloc_topology_set_xmlbuffer HWLOC_NAME(topology_set_xmlbuffer) +#define hwloc_topology_set_custom HWLOC_NAME(topology_set_custom) +#define hwloc_topology_set_distance_matrix HWLOC_NAME(topology_set_distance_matrix) + +#define hwloc_topology_discovery_support HWLOC_NAME(topology_discovery_support) +#define hwloc_topology_cpubind_support HWLOC_NAME(topology_cpubind_support) +#define hwloc_topology_membind_support HWLOC_NAME(topology_membind_support) +#define hwloc_topology_support HWLOC_NAME(topology_support) +#define hwloc_topology_get_support HWLOC_NAME(topology_get_support) +#define hwloc_topology_export_xml HWLOC_NAME(topology_export_xml) +#define hwloc_topology_export_xmlbuffer HWLOC_NAME(topology_export_xmlbuffer) +#define hwloc_free_xmlbuffer HWLOC_NAME(free_xmlbuffer) +#define hwloc_topology_set_userdata_export_callback HWLOC_NAME(topology_set_userdata_export_callback) +#define hwloc_export_obj_userdata HWLOC_NAME(export_obj_userdata) +#define hwloc_export_obj_userdata_base64 HWLOC_NAME(export_obj_userdata_base64) +#define hwloc_topology_set_userdata_import_callback HWLOC_NAME(topology_set_userdata_import_callback) + +#define hwloc_topology_insert_misc_object_by_cpuset HWLOC_NAME(topology_insert_misc_object_by_cpuset) +#define hwloc_topology_insert_misc_object_by_parent HWLOC_NAME(topology_insert_misc_object_by_parent) + +#define hwloc_custom_insert_topology HWLOC_NAME(custom_insert_topology) +#define hwloc_custom_insert_group_object_by_parent HWLOC_NAME(custom_insert_group_object_by_parent) + +#define hwloc_restrict_flags_e HWLOC_NAME(restrict_flags_e) +#define HWLOC_RESTRICT_FLAG_ADAPT_DISTANCES HWLOC_NAME_CAPS(RESTRICT_FLAG_ADAPT_DISTANCES) +#define HWLOC_RESTRICT_FLAG_ADAPT_MISC HWLOC_NAME_CAPS(RESTRICT_FLAG_ADAPT_MISC) +#define HWLOC_RESTRICT_FLAG_ADAPT_IO HWLOC_NAME_CAPS(RESTRICT_FLAG_ADAPT_IO) +#define hwloc_topology_restrict HWLOC_NAME(topology_restrict) +#define hwloc_topology_dup HWLOC_NAME(topology_dup) + +#define hwloc_topology_get_depth HWLOC_NAME(topology_get_depth) +#define hwloc_get_type_depth HWLOC_NAME(get_type_depth) + +#define hwloc_get_type_depth_e HWLOC_NAME(get_type_depth_e) +#define HWLOC_TYPE_DEPTH_UNKNOWN HWLOC_NAME_CAPS(TYPE_DEPTH_UNKNOWN) +#define HWLOC_TYPE_DEPTH_MULTIPLE HWLOC_NAME_CAPS(TYPE_DEPTH_MULTIPLE) +#define HWLOC_TYPE_DEPTH_BRIDGE HWLOC_NAME_CAPS(TYPE_DEPTH_BRIDGE) +#define HWLOC_TYPE_DEPTH_PCI_DEVICE HWLOC_NAME_CAPS(TYPE_DEPTH_PCI_DEVICE) +#define HWLOC_TYPE_DEPTH_OS_DEVICE HWLOC_NAME_CAPS(TYPE_DEPTH_OS_DEVICE) + +#define hwloc_get_depth_type HWLOC_NAME(get_depth_type) +#define hwloc_get_nbobjs_by_depth HWLOC_NAME(get_nbobjs_by_depth) +#define hwloc_get_nbobjs_by_type HWLOC_NAME(get_nbobjs_by_type) + +#define hwloc_topology_is_thissystem HWLOC_NAME(topology_is_thissystem) +#define hwloc_topology_get_flags HWLOC_NAME(topology_get_flags) + +#define hwloc_get_obj_by_depth HWLOC_NAME(get_obj_by_depth ) +#define hwloc_get_obj_by_type HWLOC_NAME(get_obj_by_type ) + +#define hwloc_obj_type_string HWLOC_NAME(obj_type_string ) +#define hwloc_obj_type_of_string HWLOC_NAME(obj_type_of_string ) +#define hwloc_obj_type_snprintf HWLOC_NAME(obj_type_snprintf ) +#define hwloc_obj_attr_snprintf HWLOC_NAME(obj_attr_snprintf ) +#define hwloc_obj_cpuset_snprintf HWLOC_NAME(obj_cpuset_snprintf) +#define hwloc_obj_get_info_by_name HWLOC_NAME(obj_get_info_by_name) +#define hwloc_obj_add_info HWLOC_NAME(obj_add_info) + +#define HWLOC_CPUBIND_PROCESS HWLOC_NAME_CAPS(CPUBIND_PROCESS) +#define HWLOC_CPUBIND_THREAD HWLOC_NAME_CAPS(CPUBIND_THREAD) +#define HWLOC_CPUBIND_STRICT HWLOC_NAME_CAPS(CPUBIND_STRICT) +#define HWLOC_CPUBIND_NOMEMBIND HWLOC_NAME_CAPS(CPUBIND_NOMEMBIND) + +#define hwloc_cpubind_flags_t HWLOC_NAME(cpubind_flags_t) + +#define hwloc_set_cpubind HWLOC_NAME(set_cpubind) +#define hwloc_get_cpubind HWLOC_NAME(get_cpubind) +#define hwloc_set_proc_cpubind HWLOC_NAME(set_proc_cpubind) +#define hwloc_get_proc_cpubind HWLOC_NAME(get_proc_cpubind) +#define hwloc_set_thread_cpubind HWLOC_NAME(set_thread_cpubind) +#define hwloc_get_thread_cpubind HWLOC_NAME(get_thread_cpubind) + +#define hwloc_get_last_cpu_location HWLOC_NAME(get_last_cpu_location) +#define hwloc_get_proc_last_cpu_location HWLOC_NAME(get_proc_last_cpu_location) + +#define HWLOC_MEMBIND_DEFAULT HWLOC_NAME_CAPS(MEMBIND_DEFAULT) +#define HWLOC_MEMBIND_FIRSTTOUCH HWLOC_NAME_CAPS(MEMBIND_FIRSTTOUCH) +#define HWLOC_MEMBIND_BIND HWLOC_NAME_CAPS(MEMBIND_BIND) +#define HWLOC_MEMBIND_INTERLEAVE HWLOC_NAME_CAPS(MEMBIND_INTERLEAVE) +#define HWLOC_MEMBIND_REPLICATE HWLOC_NAME_CAPS(MEMBIND_REPLICATE) +#define HWLOC_MEMBIND_NEXTTOUCH HWLOC_NAME_CAPS(MEMBIND_NEXTTOUCH) +#define HWLOC_MEMBIND_MIXED HWLOC_NAME_CAPS(MEMBIND_MIXED) + +#define hwloc_membind_policy_t HWLOC_NAME(membind_policy_t) + +#define HWLOC_MEMBIND_PROCESS HWLOC_NAME_CAPS(MEMBIND_PROCESS) +#define HWLOC_MEMBIND_THREAD HWLOC_NAME_CAPS(MEMBIND_THREAD) +#define HWLOC_MEMBIND_STRICT HWLOC_NAME_CAPS(MEMBIND_STRICT) +#define HWLOC_MEMBIND_MIGRATE HWLOC_NAME_CAPS(MEMBIND_MIGRATE) +#define HWLOC_MEMBIND_NOCPUBIND HWLOC_NAME_CAPS(MEMBIND_NOCPUBIND) + +#define hwloc_membind_flags_t HWLOC_NAME(membind_flags_t) + +#define hwloc_set_membind_nodeset HWLOC_NAME(set_membind_nodeset) +#define hwloc_set_membind HWLOC_NAME(set_membind) +#define hwloc_get_membind_nodeset HWLOC_NAME(get_membind_nodeset) +#define hwloc_get_membind HWLOC_NAME(get_membind) +#define hwloc_set_proc_membind_nodeset HWLOC_NAME(set_proc_membind_nodeset) +#define hwloc_set_proc_membind HWLOC_NAME(set_proc_membind) +#define hwloc_get_proc_membind_nodeset HWLOC_NAME(get_proc_membind_nodeset) +#define hwloc_get_proc_membind HWLOC_NAME(get_proc_membind) +#define hwloc_set_area_membind_nodeset HWLOC_NAME(set_area_membind_nodeset) +#define hwloc_set_area_membind HWLOC_NAME(set_area_membind) +#define hwloc_get_area_membind_nodeset HWLOC_NAME(get_area_membind_nodeset) +#define hwloc_get_area_membind HWLOC_NAME(get_area_membind) +#define hwloc_alloc_membind_nodeset HWLOC_NAME(alloc_membind_nodeset) +#define hwloc_alloc_membind HWLOC_NAME(alloc_membind) +#define hwloc_alloc HWLOC_NAME(alloc) +#define hwloc_free HWLOC_NAME(free) + +#define hwloc_get_non_io_ancestor_obj HWLOC_NAME(get_non_io_ancestor_obj) +#define hwloc_get_next_pcidev HWLOC_NAME(get_next_pcidev) +#define hwloc_get_pcidev_by_busid HWLOC_NAME(get_pcidev_by_busid) +#define hwloc_get_pcidev_by_busidstring HWLOC_NAME(get_pcidev_by_busidstring) +#define hwloc_get_next_osdev HWLOC_NAME(get_next_osdev) +#define hwloc_get_next_bridge HWLOC_NAME(get_next_bridge) +#define hwloc_bridge_covers_pcibus HWLOC_NAME(bridge_covers_pcibus) +#define hwloc_get_hostbridge_by_pcibus HWLOC_NAME(get_hostbridge_by_pcibus) + +/* hwloc/bitmap.h */ + +#define hwloc_bitmap_s HWLOC_NAME(bitmap_s) +#define hwloc_bitmap_t HWLOC_NAME(bitmap_t) +#define hwloc_const_bitmap_t HWLOC_NAME(const_bitmap_t) + +#define hwloc_bitmap_alloc HWLOC_NAME(bitmap_alloc) +#define hwloc_bitmap_alloc_full HWLOC_NAME(bitmap_alloc_full) +#define hwloc_bitmap_free HWLOC_NAME(bitmap_free) +#define hwloc_bitmap_dup HWLOC_NAME(bitmap_dup) +#define hwloc_bitmap_copy HWLOC_NAME(bitmap_copy) +#define hwloc_bitmap_snprintf HWLOC_NAME(bitmap_snprintf) +#define hwloc_bitmap_asprintf HWLOC_NAME(bitmap_asprintf) +#define hwloc_bitmap_sscanf HWLOC_NAME(bitmap_sscanf) +#define hwloc_bitmap_list_snprintf HWLOC_NAME(bitmap_list_snprintf) +#define hwloc_bitmap_list_asprintf HWLOC_NAME(bitmap_list_asprintf) +#define hwloc_bitmap_list_sscanf HWLOC_NAME(bitmap_list_sscanf) +#define hwloc_bitmap_taskset_snprintf HWLOC_NAME(bitmap_taskset_snprintf) +#define hwloc_bitmap_taskset_asprintf HWLOC_NAME(bitmap_taskset_asprintf) +#define hwloc_bitmap_taskset_sscanf HWLOC_NAME(bitmap_taskset_sscanf) +#define hwloc_bitmap_zero HWLOC_NAME(bitmap_zero) +#define hwloc_bitmap_fill HWLOC_NAME(bitmap_fill) +#define hwloc_bitmap_from_ulong HWLOC_NAME(bitmap_from_ulong) + +#define hwloc_bitmap_from_ith_ulong HWLOC_NAME(bitmap_from_ith_ulong) +#define hwloc_bitmap_to_ulong HWLOC_NAME(bitmap_to_ulong) +#define hwloc_bitmap_to_ith_ulong HWLOC_NAME(bitmap_to_ith_ulong) +#define hwloc_bitmap_only HWLOC_NAME(bitmap_only) +#define hwloc_bitmap_allbut HWLOC_NAME(bitmap_allbut) +#define hwloc_bitmap_set HWLOC_NAME(bitmap_set) +#define hwloc_bitmap_set_range HWLOC_NAME(bitmap_set_range) +#define hwloc_bitmap_set_ith_ulong HWLOC_NAME(bitmap_set_ith_ulong) +#define hwloc_bitmap_clr HWLOC_NAME(bitmap_clr) +#define hwloc_bitmap_clr_range HWLOC_NAME(bitmap_clr_range) +#define hwloc_bitmap_isset HWLOC_NAME(bitmap_isset) +#define hwloc_bitmap_iszero HWLOC_NAME(bitmap_iszero) +#define hwloc_bitmap_isfull HWLOC_NAME(bitmap_isfull) +#define hwloc_bitmap_isequal HWLOC_NAME(bitmap_isequal) +#define hwloc_bitmap_intersects HWLOC_NAME(bitmap_intersects) +#define hwloc_bitmap_isincluded HWLOC_NAME(bitmap_isincluded) +#define hwloc_bitmap_or HWLOC_NAME(bitmap_or) +#define hwloc_bitmap_and HWLOC_NAME(bitmap_and) +#define hwloc_bitmap_andnot HWLOC_NAME(bitmap_andnot) +#define hwloc_bitmap_xor HWLOC_NAME(bitmap_xor) +#define hwloc_bitmap_not HWLOC_NAME(bitmap_not) +#define hwloc_bitmap_first HWLOC_NAME(bitmap_first) +#define hwloc_bitmap_last HWLOC_NAME(bitmap_last) +#define hwloc_bitmap_next HWLOC_NAME(bitmap_next) +#define hwloc_bitmap_singlify HWLOC_NAME(bitmap_singlify) +#define hwloc_bitmap_compare_first HWLOC_NAME(bitmap_compare_first) +#define hwloc_bitmap_compare HWLOC_NAME(bitmap_compare) +#define hwloc_bitmap_weight HWLOC_NAME(bitmap_weight) + +/* hwloc/helper.h */ + +#define hwloc_get_type_or_below_depth HWLOC_NAME(get_type_or_below_depth) +#define hwloc_get_type_or_above_depth HWLOC_NAME(get_type_or_above_depth) +#define hwloc_get_root_obj HWLOC_NAME(get_root_obj) +#define hwloc_get_ancestor_obj_by_depth HWLOC_NAME(get_ancestor_obj_by_depth) +#define hwloc_get_ancestor_obj_by_type HWLOC_NAME(get_ancestor_obj_by_type) +#define hwloc_get_next_obj_by_depth HWLOC_NAME(get_next_obj_by_depth) +#define hwloc_get_next_obj_by_type HWLOC_NAME(get_next_obj_by_type) +#define hwloc_get_pu_obj_by_os_index HWLOC_NAME(get_pu_obj_by_os_index) +#define hwloc_get_next_child HWLOC_NAME(get_next_child) +#define hwloc_get_common_ancestor_obj HWLOC_NAME(get_common_ancestor_obj) +#define hwloc_obj_is_in_subtree HWLOC_NAME(obj_is_in_subtree) +#define hwloc_get_first_largest_obj_inside_cpuset HWLOC_NAME(get_first_largest_obj_inside_cpuset) +#define hwloc_get_largest_objs_inside_cpuset HWLOC_NAME(get_largest_objs_inside_cpuset) +#define hwloc_get_next_obj_inside_cpuset_by_depth HWLOC_NAME(get_next_obj_inside_cpuset_by_depth) +#define hwloc_get_next_obj_inside_cpuset_by_type HWLOC_NAME(get_next_obj_inside_cpuset_by_type) +#define hwloc_get_obj_inside_cpuset_by_depth HWLOC_NAME(get_obj_inside_cpuset_by_depth) +#define hwloc_get_obj_inside_cpuset_by_type HWLOC_NAME(get_obj_inside_cpuset_by_type) +#define hwloc_get_nbobjs_inside_cpuset_by_depth HWLOC_NAME(get_nbobjs_inside_cpuset_by_depth) +#define hwloc_get_nbobjs_inside_cpuset_by_type HWLOC_NAME(get_nbobjs_inside_cpuset_by_type) +#define hwloc_get_obj_index_inside_cpuset HWLOC_NAME(get_obj_index_inside_cpuset) +#define hwloc_get_child_covering_cpuset HWLOC_NAME(get_child_covering_cpuset) +#define hwloc_get_obj_covering_cpuset HWLOC_NAME(get_obj_covering_cpuset) +#define hwloc_get_next_obj_covering_cpuset_by_depth HWLOC_NAME(get_next_obj_covering_cpuset_by_depth) +#define hwloc_get_next_obj_covering_cpuset_by_type HWLOC_NAME(get_next_obj_covering_cpuset_by_type) +#define hwloc_get_cache_type_depth HWLOC_NAME(get_cache_type_depth) +#define hwloc_get_cache_covering_cpuset HWLOC_NAME(get_cache_covering_cpuset) +#define hwloc_get_shared_cache_covering_obj HWLOC_NAME(get_shared_cache_covering_obj) +#define hwloc_get_closest_objs HWLOC_NAME(get_closest_objs) +#define hwloc_get_obj_below_by_type HWLOC_NAME(get_obj_below_by_type) +#define hwloc_get_obj_below_array_by_type HWLOC_NAME(get_obj_below_array_by_type) +#define hwloc_distributev HWLOC_NAME(distributev) +#define hwloc_distribute HWLOC_NAME(distribute) +#define hwloc_alloc_membind_policy HWLOC_NAME(alloc_membind_policy) +#define hwloc_alloc_membind_policy_nodeset HWLOC_NAME(alloc_membind_policy_nodeset) +#define hwloc_topology_get_complete_cpuset HWLOC_NAME(topology_get_complete_cpuset) +#define hwloc_topology_get_topology_cpuset HWLOC_NAME(topology_get_topology_cpuset) +#define hwloc_topology_get_online_cpuset HWLOC_NAME(topology_get_online_cpuset) +#define hwloc_topology_get_allowed_cpuset HWLOC_NAME(topology_get_allowed_cpuset) +#define hwloc_topology_get_complete_nodeset HWLOC_NAME(topology_get_complete_nodeset) +#define hwloc_topology_get_topology_nodeset HWLOC_NAME(topology_get_topology_nodeset) +#define hwloc_topology_get_allowed_nodeset HWLOC_NAME(topology_get_allowed_nodeset) +#define hwloc_cpuset_to_nodeset HWLOC_NAME(cpuset_to_nodeset) +#define hwloc_cpuset_to_nodeset_strict HWLOC_NAME(cpuset_to_nodeset_strict) +#define hwloc_cpuset_from_nodeset HWLOC_NAME(cpuset_from_nodeset) +#define hwloc_cpuset_from_nodeset_strict HWLOC_NAME(cpuset_from_nodeset_strict) +#define hwloc_get_whole_distance_matrix_by_depth HWLOC_NAME(get_whole_distance_matrix_by_depth) +#define hwloc_get_whole_distance_matrix_by_type HWLOC_NAME(get_whole_distance_matrix_by_type) +#define hwloc_get_distance_matrix_covering_obj_by_depth HWLOC_NAME(get_distance_matrix_covering_obj_by_depth) +#define hwloc_get_latency HWLOC_NAME(get_latency) + +/* diff.h */ + +#define hwloc_topology_diff_obj_attr_type_e HWLOC_NAME(topology_diff_obj_attr_type_e) +#define hwloc_topology_diff_obj_attr_type_t HWLOC_NAME(topology_diff_obj_attr_type_t) +#define HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_SIZE HWLOC_NAME_CAPS(TOPOLOGY_DIFF_OBJ_ATTR_SIZE) +#define HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_NAME HWLOC_NAME_CAPS(TOPOLOGY_DIFF_OBJ_ATTR_NAME) +#define HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_INFO HWLOC_NAME_CAPS(TOPOLOGY_DIFF_OBJ_ATTR_INFO) +#define hwloc_topology_diff_obj_attr_u HWLOC_NAME(topology_diff_obj_attr_u) +#define hwloc_topology_diff_obj_attr_generic_s HWLOC_NAME(topology_diff_obj_attr_generic_s) +#define hwloc_topology_diff_obj_attr_uint64_s HWLOC_NAME(topology_diff_obj_attr_uint64_s) +#define hwloc_topology_diff_obj_attr_string_s HWLOC_NAME(topology_diff_obj_attr_string_s) +#define hwloc_topology_diff_type_e HWLOC_NAME(topology_diff_type_e) +#define hwloc_topology_diff_type_t HWLOC_NAME(topology_diff_type_t) +#define HWLOC_TOPOLOGY_DIFF_OBJ_ATTR HWLOC_NAME_CAPS(TOPOLOGY_DIFF_OBJ_ATTR) +#define HWLOC_TOPOLOGY_DIFF_TOO_COMPLEX HWLOC_NAME_CAPS(TOPOLOGY_DIFF_TOO_COMPLEX) +#define hwloc_topology_diff_u HWLOC_NAME(topology_diff_u) +#define hwloc_topology_diff_t HWLOC_NAME(topology_diff_t) +#define hwloc_topology_diff_generic_s HWLOC_NAME(topology_diff_generic_s) +#define hwloc_topology_diff_obj_attr_s HWLOC_NAME(topology_diff_obj_attr_s) +#define hwloc_topology_diff_too_complex_s HWLOC_NAME(topology_diff_too_complex_s) +#define hwloc_topology_diff_build HWLOC_NAME(topology_diff_build) +#define hwloc_topology_diff_apply_flags_e HWLOC_NAME(topology_diff_apply_flags_e) +#define HWLOC_TOPOLOGY_DIFF_APPLY_REVERSE HWLOC_NAME_CAPS(TOPOLOGY_DIFF_APPLY_REVERSE) +#define hwloc_topology_diff_apply HWLOC_NAME(topology_diff_apply) +#define hwloc_topology_diff_destroy HWLOC_NAME(topology_diff_destroy) +#define hwloc_topology_diff_load_xml HWLOC_NAME(topology_diff_load_xml) +#define hwloc_topology_diff_export_xml HWLOC_NAME(topology_diff_export_xml) +#define hwloc_topology_diff_load_xmlbuffer HWLOC_NAME(topology_diff_load_xmlbuffer) +#define hwloc_topology_diff_export_xmlbuffer HWLOC_NAME(topology_diff_export_xmlbuffer) + +/* glibc-sched.h */ + +#define hwloc_cpuset_to_glibc_sched_affinity HWLOC_NAME(cpuset_to_glibc_sched_affinity) +#define hwloc_cpuset_from_glibc_sched_affinity HWLOC_NAME(cpuset_from_glibc_sched_affinity) + +/* linux-libnuma.h */ + +#define hwloc_cpuset_to_linux_libnuma_ulongs HWLOC_NAME(cpuset_to_linux_libnuma_ulongs) +#define hwloc_nodeset_to_linux_libnuma_ulongs HWLOC_NAME(nodeset_to_linux_libnuma_ulongs) +#define hwloc_cpuset_from_linux_libnuma_ulongs HWLOC_NAME(cpuset_from_linux_libnuma_ulongs) +#define hwloc_nodeset_from_linux_libnuma_ulongs HWLOC_NAME(nodeset_from_linux_libnuma_ulongs) +#define hwloc_cpuset_to_linux_libnuma_bitmask HWLOC_NAME(cpuset_to_linux_libnuma_bitmask) +#define hwloc_nodeset_to_linux_libnuma_bitmask HWLOC_NAME(nodeset_to_linux_libnuma_bitmask) +#define hwloc_cpuset_from_linux_libnuma_bitmask HWLOC_NAME(cpuset_from_linux_libnuma_bitmask) +#define hwloc_nodeset_from_linux_libnuma_bitmask HWLOC_NAME(nodeset_from_linux_libnuma_bitmask) + +/* linux.h */ + +#define hwloc_linux_parse_cpumap_file HWLOC_NAME(linux_parse_cpumap_file) +#define hwloc_linux_set_tid_cpubind HWLOC_NAME(linux_set_tid_cpubind) +#define hwloc_linux_get_tid_cpubind HWLOC_NAME(linux_get_tid_cpubind) + +/* openfabrics-verbs.h */ + +#define hwloc_ibv_get_device_cpuset HWLOC_NAME(ibv_get_device_cpuset) +#define hwloc_ibv_get_device_osdev HWLOC_NAME(ibv_get_device_osdev) +#define hwloc_ibv_get_device_osdev_by_name HWLOC_NAME(ibv_get_device_osdev_by_name) + +/* myriexpress.h */ + +#define hwloc_mx_board_get_device_cpuset HWLOC_NAME(mx_board_get_device_cpuset) +#define hwloc_mx_endpoint_get_device_cpuset HWLOC_NAME(mx_endpoint_get_device_cpuset) + +/* intel-mic.h */ + +#define hwloc_intel_mic_get_device_cpuset HWLOC_NAME(intel_mic_get_device_cpuset) +#define hwloc_intel_mic_get_device_osdev_by_index HWLOC_NAME(intel_mic_get_device_osdev_by_index) + +/* opencl.h */ + +#define hwloc_opencl_get_device_cpuset HWLOC_NAME(opencl_get_device_cpuset) +#define hwloc_opencl_get_device_osdev HWLOC_NAME(opencl_get_device_osdev) +#define hwloc_opencl_get_device_osdev_by_index HWLOC_NAME(opencl_get_device_osdev_by_index) + +/* cuda.h */ + +#define hwloc_cuda_get_device_pci_ids HWLOC_NAME(cuda_get_device_pci_ids) +#define hwloc_cuda_get_device_cpuset HWLOC_NAME(cuda_get_device_cpuset) +#define hwloc_cuda_get_device_pcidev HWLOC_NAME(cuda_get_device_pcidev) +#define hwloc_cuda_get_device_osdev HWLOC_NAME(cuda_get_device_osdev) +#define hwloc_cuda_get_device_osdev_by_index HWLOC_NAME(cuda_get_device_osdev_by_index) + +/* cudart.h */ + +#define hwloc_cudart_get_device_pci_ids HWLOC_NAME(cudart_get_device_pci_ids) +#define hwloc_cudart_get_device_cpuset HWLOC_NAME(cudart_get_device_cpuset) +#define hwloc_cudart_get_device_pcidev HWLOC_NAME(cudart_get_device_pcidev) +#define hwloc_cudart_get_device_osdev_by_index HWLOC_NAME(cudart_get_device_osdev_by_index) + +/* nvml.h */ + +#define hwloc_nvml_get_device_cpuset HWLOC_NAME(nvml_get_device_cpuset) +#define hwloc_nvml_get_device_osdev HWLOC_NAME(nvml_get_device_osdev) +#define hwloc_nvml_get_device_osdev_by_index HWLOC_NAME(nvml_get_device_osdev_by_index) + +/* gl.h */ + +#define hwloc_gl_get_display_osdev_by_port_device HWLOC_NAME(gl_get_display_osdev_by_port_device) +#define hwloc_gl_get_display_osdev_by_name HWLOC_NAME(gl_get_display_osdev_by_name) +#define hwloc_gl_get_display_by_osdev HWLOC_NAME(gl_get_display_by_osdev) + +/* hwloc/plugins.h */ + +#define hwloc_disc_component_type_e HWLOC_NAME(disc_component_type_e) +#define HWLOC_DISC_COMPONENT_TYPE_CPU HWLOC_NAME_CAPS(DISC_COMPONENT_TYPE_CPU) +#define HWLOC_DISC_COMPONENT_TYPE_GLOBAL HWLOC_NAME_CAPS(DISC_COMPONENT_TYPE_GLOBAL) +#define HWLOC_DISC_COMPONENT_TYPE_MISC HWLOC_NAME_CAPS(DISC_COMPONENT_TYPE_MISC) +#define hwloc_disc_component_type_t HWLOC_NAME(disc_component_type_t) +#define hwloc_disc_component HWLOC_NAME(disc_component) + +#define hwloc_backend HWLOC_NAME(backend) +#define hwloc_backend_flag_e HWLOC_NAME(backend_flag_e) +#define HWLOC_BACKEND_FLAG_NEED_LEVELS HWLOC_NAME_CAPS(BACKEND_FLAG_NEED_LEVELS) + +#define hwloc_backend_alloc HWLOC_NAME(backend_alloc) +#define hwloc_backend_enable HWLOC_NAME(backend_enable) +#define hwloc_backends_get_obj_cpuset HWLOC_NAME(backends_get_obj_cpuset) +#define hwloc_backends_notify_new_object HWLOC_NAME(backends_notify_new_object) + +#define hwloc_component_type_e HWLOC_NAME(component_type_e) +#define HWLOC_COMPONENT_TYPE_DISC HWLOC_NAME_CAPS(COMPONENT_TYPE_DISC) +//#define HWLOC_COMPONENT_TYPE_XML HWLOC_NAME_CAPS(COMPONENT_TYPE_XML) +#define hwloc_component_type_t HWLOC_NAME(component_type_t) +#define hwloc_component HWLOC_NAME(component) + +#define hwloc_plugin_check_namespace HWLOC_NAME(plugin_check_namespace) + +#define hwloc_insert_object_by_cpuset HWLOC_NAME(insert_object_by_cpuset) +#define hwloc_report_error_t HWLOC_NAME(report_error_t) +#define hwloc_report_os_error HWLOC_NAME(report_os_error) +#define hwloc_hide_errors HWLOC_NAME(hide_errors) +#define hwloc__insert_object_by_cpuset HWLOC_NAME(_insert_object_by_cpuset) +#define hwloc_insert_object_by_parent HWLOC_NAME(insert_object_by_parent) +#define hwloc_alloc_setup_object HWLOC_NAME(alloc_setup_object) +#define hwloc_fill_object_sets HWLOC_NAME(fill_object_sets) + +#define hwloc_insert_pci_device_list HWLOC_NAME(insert_pci_device_list) +#define hwloc_pci_find_cap HWLOC_NAME(pci_find_cap) +#define hwloc_pci_find_linkspeed HWLOC_NAME(pci_find_linkspeed) +#define hwloc_pci_prepare_bridge HWLOC_NAME(pci_prepare_bridge) + +/* hwloc/deprecated.h */ + +#define hwloc_obj_snprintf HWLOC_NAME(obj_snprintf) + +/* private/debug.h */ + +#define hwloc_debug HWLOC_NAME(debug) + +/* private/misc.h */ + +#define hwloc_snprintf HWLOC_NAME(snprintf) +#define hwloc_namecoloncmp HWLOC_NAME(namecoloncmp) +#define hwloc_ffsl_manual HWLOC_NAME(ffsl_manual) +#define hwloc_ffs32 HWLOC_NAME(ffs32) +#define hwloc_ffsl_from_ffs32 HWLOC_NAME(ffsl_from_ffs32) +#define hwloc_flsl_manual HWLOC_NAME(flsl_manual) +#define hwloc_fls32 HWLOC_NAME(fls32) +#define hwloc_flsl_from_fls32 HWLOC_NAME(flsl_from_fls32) +#define hwloc_weight_long HWLOC_NAME(weight_long) + +/* private/cpuid.h */ + +#define hwloc_have_cpuid HWLOC_NAME(have_cpuid) +#define hwloc_cpuid HWLOC_NAME(cpuid) + +/* private/xml.h */ + +#define hwloc__xml_verbose HWLOC_NAME(_xml_verbose) + +#define hwloc__xml_import_state_s HWLOC_NAME(_xml_import_state_s) +#define hwloc__xml_import_state_t HWLOC_NAME(_xml_import_state_t) +#define hwloc__xml_import_diff HWLOC_NAME(_xml_import_diff) +#define hwloc_xml_backend_data_s HWLOC_NAME(xml_backend_data_s) +#define hwloc__xml_export_state_s HWLOC_NAME(_xml_export_state_s) +#define hwloc__xml_export_state_t HWLOC_NAME(_xml_export_state_t) +#define hwloc__xml_export_object HWLOC_NAME(_xml_export_object) +#define hwloc__xml_export_diff HWLOC_NAME(_xml_export_diff) + +#define hwloc_xml_callbacks HWLOC_NAME(xml_callbacks) +#define hwloc_xml_component HWLOC_NAME(xml_component) +#define hwloc_xml_callbacks_register HWLOC_NAME(xml_callbacks_register) +#define hwloc_xml_callbacks_reset HWLOC_NAME(xml_callbacks_reset) + +/* private/components.h */ + +#define hwloc_disc_component_force_enable HWLOC_NAME(disc_component_force_enable) +#define hwloc_disc_components_enable_others HWLOC_NAME(disc_components_instantiate_others) + +#define hwloc_backends_disable_all HWLOC_NAME(backends_disable_all) +#define hwloc_backends_is_thissystem HWLOC_NAME(backends_is_thissystem) + +#define hwloc_components_init HWLOC_NAME(components_init) +#define hwloc_components_destroy_all HWLOC_NAME(components_destroy_all) + +/* private/private.h */ + +#define hwloc_ignore_type_e HWLOC_NAME(ignore_type_e) + +#define HWLOC_IGNORE_TYPE_NEVER HWLOC_NAME_CAPS(IGNORE_TYPE_NEVER) +#define HWLOC_IGNORE_TYPE_KEEP_STRUCTURE HWLOC_NAME_CAPS(IGNORE_TYPE_KEEP_STRUCTURE) +#define HWLOC_IGNORE_TYPE_ALWAYS HWLOC_NAME_CAPS(IGNORE_TYPE_ALWAYS) + +#define hwloc_os_distances_s HWLOC_NAME(os_distances_s) + +#define hwloc_xml_imported_distances_s HWLOC_NAME(xml_imported_distances_s) + +#define hwloc_alloc_obj_cpusets HWLOC_NAME(alloc_obj_cpusets) +#define hwloc_setup_pu_level HWLOC_NAME(setup_pu_level) +#define hwloc_get_sysctlbyname HWLOC_NAME(get_sysctlbyname) +#define hwloc_get_sysctl HWLOC_NAME(get_sysctl) +#define hwloc_fallback_nbprocessors HWLOC_NAME(fallback_nbprocessors) +#define hwloc_connect_children HWLOC_NAME(connect_children) +#define hwloc_connect_levels HWLOC_NAME(connect_levels) + +#define hwloc_topology_setup_defaults HWLOC_NAME(topology_setup_defaults) +#define hwloc_topology_clear HWLOC_NAME(topology_clear) + +#define hwloc_binding_hooks HWLOC_NAME(binding_hooks) +#define hwloc_set_native_binding_hooks HWLOC_NAME(set_native_binding_hooks) +#define hwloc_set_binding_hooks HWLOC_NAME(set_binding_hooks) + +#define hwloc_set_linuxfs_hooks HWLOC_NAME(set_linuxfs_hooks) +#define hwloc_set_bgq_hooks HWLOC_NAME(set_bgq_hooks) +#define hwloc_set_solaris_hooks HWLOC_NAME(set_solaris_hooks) +#define hwloc_set_aix_hooks HWLOC_NAME(set_aix_hooks) +#define hwloc_set_osf_hooks HWLOC_NAME(set_osf_hooks) +#define hwloc_set_windows_hooks HWLOC_NAME(set_windows_hooks) +#define hwloc_set_darwin_hooks HWLOC_NAME(set_darwin_hooks) +#define hwloc_set_freebsd_hooks HWLOC_NAME(set_freebsd_hooks) +#define hwloc_set_netbsd_hooks HWLOC_NAME(set_netbsd_hooks) +#define hwloc_set_hpux_hooks HWLOC_NAME(set_hpux_hooks) + +#define hwloc_add_uname_info HWLOC_NAME(add_uname_info) +#define hwloc_free_unlinked_object HWLOC_NAME(free_unlinked_object) +#define hwloc__duplicate_objects HWLOC_NAME(_duplicate_objects) + +#define hwloc_alloc_heap HWLOC_NAME(alloc_heap) +#define hwloc_alloc_mmap HWLOC_NAME(alloc_mmap) +#define hwloc_free_heap HWLOC_NAME(free_heap) +#define hwloc_free_mmap HWLOC_NAME(free_mmap) +#define hwloc_alloc_or_fail HWLOC_NAME(alloc_or_fail) + +#define hwloc_distances_init HWLOC_NAME(distances_init) +#define hwloc_distances_destroy HWLOC_NAME(distances_destroy) +#define hwloc_distances_set HWLOC_NAME(distances_set) +#define hwloc_distances_set_from_env HWLOC_NAME(distances_set_from_env) +#define hwloc_distances_restrict_os HWLOC_NAME(distances_restrict_os) +#define hwloc_distances_restrict HWLOC_NAME(distances_restrict) +#define hwloc_distances_finalize_os HWLOC_NAME(distances_finalize_os) +#define hwloc_distances_finalize_logical HWLOC_NAME(distances_finalize_logical) +#define hwloc_clear_object_distances HWLOC_NAME(clear_object_distances) +#define hwloc_clear_object_distances_one HWLOC_NAME(clear_object_distances_one) +#define hwloc_group_by_distances HWLOC_NAME(group_by_distances) + +#define hwloc_encode_to_base64 HWLOC_NAME(encode_to_base64) +#define hwloc_decode_from_base64 HWLOC_NAME(decode_from_base64) + +#define hwloc_obj_add_info_nodup HWLOC_NAME(obj_add_info_nodup) + +/* private/solaris-chiptype.h */ + +#define hwloc_solaris_get_chip_type HWLOC_NAME(solaris_get_chip_type) +#define hwloc_solaris_get_chip_model HWLOC_NAME(solaris_get_chip_model) + +#endif /* HWLOC_SYM_TRANSFORM */ + + +#ifdef __cplusplus +} /* extern "C" */ +#endif + + +#endif /* HWLOC_RENAME_H */ diff --git a/ext/hwloc/include/numa.h b/ext/hwloc/include/numa.h new file mode 100644 index 000000000..1dbc13728 --- /dev/null +++ b/ext/hwloc/include/numa.h @@ -0,0 +1,468 @@ +/* Copyright (C) 2003,2004 Andi Kleen, SuSE Labs. + + libnuma is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; version + 2.1. + + libnuma is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should find a copy of v2.1 of the GNU Lesser General Public License + somewhere on your Linux system; if not, write to the Free Software + Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ + +#ifndef _NUMA_H +#define _NUMA_H 1 + +/* allow an application to test for the current programming interface: */ +#define LIBNUMA_API_VERSION 2 + +/* Simple NUMA policy library */ + +#include +#include +#include +#include + +#if defined(__x86_64__) || defined(__i386__) +#define NUMA_NUM_NODES 128 +#else +#define NUMA_NUM_NODES 2048 +#endif + +#ifdef __cplusplus +extern "C" { +#endif + +typedef struct { + unsigned long n[NUMA_NUM_NODES/(sizeof(unsigned long)*8)]; +} nodemask_t; + +struct bitmask { + unsigned long size; /* number of bits in the map */ + unsigned long *maskp; +}; + +/* operations on struct bitmask */ +int numa_bitmask_isbitset(const struct bitmask *, unsigned int); +struct bitmask *numa_bitmask_setall(struct bitmask *); +struct bitmask *numa_bitmask_clearall(struct bitmask *); +struct bitmask *numa_bitmask_setbit(struct bitmask *, unsigned int); +struct bitmask *numa_bitmask_clearbit(struct bitmask *, unsigned int); +unsigned int numa_bitmask_nbytes(struct bitmask *); +struct bitmask *numa_bitmask_alloc(unsigned int); +void numa_bitmask_free(struct bitmask *); +int numa_bitmask_equal(const struct bitmask *, const struct bitmask *); +void copy_nodemask_to_bitmask(nodemask_t *, struct bitmask *); +void copy_bitmask_to_nodemask(struct bitmask *, nodemask_t *); +void copy_bitmask_to_bitmask(struct bitmask *, struct bitmask *); + +/* compatibility for codes that used them: */ + +static inline void nodemask_zero(nodemask_t *mask) +{ + struct bitmask tmp; + + tmp.maskp = (unsigned long *)mask; + tmp.size = sizeof(nodemask_t) * 8; + numa_bitmask_clearall(&tmp); +} + +static inline void nodemask_zero_compat(nodemask_t *mask) +{ + struct bitmask tmp; + + tmp.maskp = (unsigned long *)mask; + tmp.size = sizeof(nodemask_t) * 8; + numa_bitmask_clearall(&tmp); +} + +static inline void nodemask_set_compat(nodemask_t *mask, int node) +{ + mask->n[node / (8*sizeof(unsigned long))] |= + (1UL<<(node%(8*sizeof(unsigned long)))); +} + +static inline void nodemask_clr_compat(nodemask_t *mask, int node) +{ + mask->n[node / (8*sizeof(unsigned long))] &= + ~(1UL<<(node%(8*sizeof(unsigned long)))); +} + +static inline int nodemask_isset_compat(const nodemask_t *mask, int node) +{ + if ((unsigned)node >= NUMA_NUM_NODES) + return 0; + if (mask->n[node / (8*sizeof(unsigned long))] & + (1UL<<(node%(8*sizeof(unsigned long))))) + return 1; + return 0; +} + +static inline int nodemask_equal(const nodemask_t *a, const nodemask_t *b) +{ + struct bitmask tmp_a, tmp_b; + + tmp_a.maskp = (unsigned long *)a; + tmp_a.size = sizeof(nodemask_t) * 8; + + tmp_b.maskp = (unsigned long *)b; + tmp_b.size = sizeof(nodemask_t) * 8; + + return numa_bitmask_equal(&tmp_a, &tmp_b); +} + +static inline int nodemask_equal_compat(const nodemask_t *a, const nodemask_t *b) +{ + struct bitmask tmp_a, tmp_b; + + tmp_a.maskp = (unsigned long *)a; + tmp_a.size = sizeof(nodemask_t) * 8; + + tmp_b.maskp = (unsigned long *)b; + tmp_b.size = sizeof(nodemask_t) * 8; + + return numa_bitmask_equal(&tmp_a, &tmp_b); +} + +/* NUMA support available. If this returns a negative value all other function + in this library are undefined. */ +int numa_available(void); + +/* Basic NUMA state */ + +/* Get max available node */ +int numa_max_node(void); +int numa_max_possible_node(void); +/* Return preferred node */ +int numa_preferred(void); + +/* Return node size and free memory */ +long long numa_node_size64(int node, long long *freep); +long numa_node_size(int node, long *freep); + +int numa_pagesize(void); + +/* Set with all nodes from which the calling process may allocate memory. + Only valid after numa_available. */ +extern struct bitmask *numa_all_nodes_ptr; + +/* Set with all nodes the kernel has exposed to userspace */ +extern struct bitmask *numa_nodes_ptr; + +/* For source compatibility */ +extern nodemask_t numa_all_nodes; + +/* Set with all cpus. */ +extern struct bitmask *numa_all_cpus_ptr; + +/* Set with no nodes */ +extern struct bitmask *numa_no_nodes_ptr; + +/* Source compatibility */ +extern nodemask_t numa_no_nodes; + +/* Only run and allocate memory from a specific set of nodes. */ +void numa_bind(struct bitmask *nodes); + +/* Set the NUMA node interleaving mask. 0 to turn off interleaving */ +void numa_set_interleave_mask(struct bitmask *nodemask); + +/* Return the current interleaving mask */ +struct bitmask *numa_get_interleave_mask(void); + +/* allocate a bitmask big enough for all nodes */ +struct bitmask *numa_allocate_nodemask(void); + +static inline void numa_free_nodemask(struct bitmask *b) +{ + numa_bitmask_free(b); +} + +/* Some node to preferably allocate memory from for task. */ +void numa_set_preferred(int node); + +/* Set local memory allocation policy for task */ +void numa_set_localalloc(void); + +/* Only allocate memory from the nodes set in mask. 0 to turn off */ +void numa_set_membind(struct bitmask *nodemask); + +/* Return current membind */ +struct bitmask *numa_get_membind(void); + +/* Return allowed memories [nodes] */ +struct bitmask *numa_get_mems_allowed(void); + +int numa_get_interleave_node(void); + +/* NUMA memory allocation. These functions always round to page size + and are relatively slow. */ + +/* Alloc memory page interleaved on nodes in mask */ +void *numa_alloc_interleaved_subset(size_t size, struct bitmask *nodemask); +/* Alloc memory page interleaved on all nodes. */ +void *numa_alloc_interleaved(size_t size); +/* Alloc memory located on node */ +void *numa_alloc_onnode(size_t size, int node); +/* Alloc memory on local node */ +void *numa_alloc_local(size_t size); +/* Allocation with current policy */ +void *numa_alloc(size_t size); +/* Change the size of a memory area preserving the memory policy */ +void *numa_realloc(void *old_addr, size_t old_size, size_t new_size); +/* Free memory allocated by the functions above */ +void numa_free(void *mem, size_t size); + +/* Low level functions, primarily for shared memory. All memory + processed by these must not be touched yet */ + +/* Interleave an memory area. */ +void numa_interleave_memory(void *mem, size_t size, struct bitmask *mask); + +/* Allocate a memory area on a specific node. */ +void numa_tonode_memory(void *start, size_t size, int node); + +/* Allocate memory on a mask of nodes. */ +void numa_tonodemask_memory(void *mem, size_t size, struct bitmask *mask); + +/* Allocate a memory area on the current node. */ +void numa_setlocal_memory(void *start, size_t size); + +/* Allocate memory area with current memory policy */ +void numa_police_memory(void *start, size_t size); + +/* Run current task only on nodes in mask */ +int numa_run_on_node_mask(struct bitmask *mask); +/* Run current task only on node */ +int numa_run_on_node(int node); +/* Return current mask of nodes the task can run on */ +struct bitmask * numa_get_run_node_mask(void); + +/* When strict fail allocation when memory cannot be allocated in target node(s). */ +void numa_set_bind_policy(int strict); + +/* Fail when existing memory has incompatible policy */ +void numa_set_strict(int flag); + +/* maximum nodes (size of kernel nodemask_t) */ +int numa_num_possible_nodes(); + +/* maximum cpus (size of kernel cpumask_t) */ +int numa_num_possible_cpus(); + +/* nodes in the system */ +int numa_num_configured_nodes(); + +/* maximum cpus */ +int numa_num_configured_cpus(); + +/* maximum cpus allowed to current task */ +int numa_num_task_cpus(); +int numa_num_thread_cpus(); /* backward compatibility */ + +/* maximum nodes allowed to current task */ +int numa_num_task_nodes(); +int numa_num_thread_nodes(); /* backward compatibility */ + +/* allocate a bitmask the size of the kernel cpumask_t */ +struct bitmask *numa_allocate_cpumask(); + +static inline void numa_free_cpumask(struct bitmask *b) +{ + numa_bitmask_free(b); +} + +/* Convert node to CPU mask. -1/errno on failure, otherwise 0. */ +int numa_node_to_cpus(int, struct bitmask *); + +/* report the node of the specified cpu. -1/errno on invalid cpu. */ +int numa_node_of_cpu(int cpu); + +/* Report distance of node1 from node2. 0 on error.*/ +int numa_distance(int node1, int node2); + +/* Error handling. */ +/* This is an internal function in libnuma that can be overwritten by an user + program. Default is to print an error to stderr and exit if numa_exit_on_error + is true. */ +void numa_error(char *where); + +/* When true exit the program when a NUMA system call (except numa_available) + fails */ +extern int numa_exit_on_error; +/* Warning function. Can also be overwritten. Default is to print on stderr + once. */ +void numa_warn(int num, char *fmt, ...); + +/* When true exit the program on a numa_warn() call */ +extern int numa_exit_on_warn; + +int numa_migrate_pages(int pid, struct bitmask *from, struct bitmask *to); + +int numa_move_pages(int pid, unsigned long count, void **pages, + const int *nodes, int *status, int flags); + +int numa_sched_getaffinity(pid_t, struct bitmask *); +int numa_sched_setaffinity(pid_t, struct bitmask *); + +/* Convert an ascii list of nodes to a bitmask */ +struct bitmask *numa_parse_nodestring(char *); + +/* Convert an ascii list of cpu to a bitmask */ +struct bitmask *numa_parse_cpustring(char *); + +/* + * The following functions are for source code compatibility + * with releases prior to version 2. + * Such codes should be compiled with NUMA_VERSION1_COMPATIBILITY defined. + */ + +static inline void numa_set_interleave_mask_compat(nodemask_t *nodemask) +{ + struct bitmask tmp; + + tmp.maskp = (unsigned long *)nodemask; + tmp.size = sizeof(nodemask_t) * 8; + numa_set_interleave_mask(&tmp); +} + +static inline nodemask_t numa_get_interleave_mask_compat() +{ + struct bitmask *tp; + nodemask_t mask; + + tp = numa_get_interleave_mask(); + copy_bitmask_to_nodemask(tp, &mask); + numa_bitmask_free(tp); + return mask; +} + +static inline void numa_bind_compat(nodemask_t *mask) +{ + struct bitmask *tp; + + tp = numa_allocate_nodemask(); + copy_nodemask_to_bitmask(mask, tp); + numa_bind(tp); + numa_bitmask_free(tp); +} + +static inline void numa_set_membind_compat(nodemask_t *mask) +{ + struct bitmask tmp; + + tmp.maskp = (unsigned long *)mask; + tmp.size = sizeof(nodemask_t) * 8; + numa_set_membind(&tmp); +} + +static inline nodemask_t numa_get_membind_compat() +{ + struct bitmask *tp; + nodemask_t mask; + + tp = numa_get_membind(); + copy_bitmask_to_nodemask(tp, &mask); + numa_bitmask_free(tp); + return mask; +} + +static inline void *numa_alloc_interleaved_subset_compat(size_t size, + const nodemask_t *mask) +{ + struct bitmask tmp; + + tmp.maskp = (unsigned long *)mask; + tmp.size = sizeof(nodemask_t) * 8; + return numa_alloc_interleaved_subset(size, &tmp); +} + +static inline int numa_run_on_node_mask_compat(const nodemask_t *mask) +{ + struct bitmask tmp; + + tmp.maskp = (unsigned long *)mask; + tmp.size = sizeof(nodemask_t) * 8; + return numa_run_on_node_mask(&tmp); +} + +static inline nodemask_t numa_get_run_node_mask_compat() +{ + struct bitmask *tp; + nodemask_t mask; + + tp = numa_get_run_node_mask(); + copy_bitmask_to_nodemask(tp, &mask); + numa_bitmask_free(tp); + return mask; +} + +static inline void numa_interleave_memory_compat(void *mem, size_t size, + const nodemask_t *mask) +{ + struct bitmask tmp; + + tmp.maskp = (unsigned long *)mask; + tmp.size = sizeof(nodemask_t) * 8; + numa_interleave_memory(mem, size, &tmp); +} + +static inline void numa_tonodemask_memory_compat(void *mem, size_t size, + const nodemask_t *mask) +{ + struct bitmask tmp; + + tmp.maskp = (unsigned long *)mask; + tmp.size = sizeof(nodemask_t) * 8; + numa_tonodemask_memory(mem, size, &tmp); +} + +static inline int numa_sched_getaffinity_compat(pid_t pid, unsigned len, + unsigned long *mask) +{ + struct bitmask tmp; + + tmp.maskp = (unsigned long *)mask; + tmp.size = len * 8; + return numa_sched_getaffinity(pid, &tmp); +} + +static inline int numa_sched_setaffinity_compat(pid_t pid, unsigned len, + unsigned long *mask) +{ + struct bitmask tmp; + + tmp.maskp = (unsigned long *)mask; + tmp.size = len * 8; + return numa_sched_setaffinity(pid, &tmp); +} + +static inline int numa_node_to_cpus_compat(int node, unsigned long *buffer, + int buffer_len) +{ + struct bitmask tmp; + + tmp.maskp = (unsigned long *)buffer; + tmp.size = buffer_len * 8; + return numa_node_to_cpus(node, &tmp); +} + +/* end of version 1 compatibility functions */ + +/* + * To compile an application that uses libnuma version 1: + * add -DNUMA_VERSION1_COMPATIBILITY to your Makefile's CFLAGS + */ +#ifdef NUMA_VERSION1_COMPATIBILITY +#include +#endif + +#ifdef __cplusplus +} +#endif + +#endif diff --git a/ext/hwloc/include/pci/config.h b/ext/hwloc/include/pci/config.h new file mode 100644 index 000000000..1f05b1241 --- /dev/null +++ b/ext/hwloc/include/pci/config.h @@ -0,0 +1,18 @@ +#define PCI_CONFIG_H +#define PCI_ARCH_X86_64 +#define PCI_OS_LINUX +#define PCI_HAVE_PM_LINUX_SYSFS +#define PCI_HAVE_PM_LINUX_PROC +#define PCI_HAVE_LINUX_BYTEORDER_H +#define PCI_PATH_PROC_BUS_PCI "/proc/bus/pci" +#define PCI_PATH_SYS_BUS_PCI "/sys/bus/pci" +#define PCI_HAVE_PM_INTEL_CONF +#define PCI_HAVE_64BIT_ADDRESS +#define PCI_HAVE_PM_DUMP +#define PCI_COMPRESSED_IDS +#define PCI_IDS "pci.ids.gz" +#define PCI_PATH_IDS_DIR "/usr/share/misc" +#define PCI_USE_DNS +#define PCI_ID_DOMAIN "pci.id.ucw.cz" +#define PCI_SHARED_LIB +#define PCILIB_VERSION "3.1.8" diff --git a/ext/hwloc/include/pci/header.h b/ext/hwloc/include/pci/header.h new file mode 100644 index 000000000..d481f2769 --- /dev/null +++ b/ext/hwloc/include/pci/header.h @@ -0,0 +1,1195 @@ +/* + * The PCI Library -- PCI Header Structure (based on ) + * + * Copyright (c) 1997--2010 Martin Mares + * + * Can be freely distributed and used under the terms of the GNU GPL. + */ + +/* + * Under PCI, each device has 256 bytes of configuration address space, + * of which the first 64 bytes are standardized as follows: + */ +#define PCI_VENDOR_ID 0x00 /* 16 bits */ +#define PCI_DEVICE_ID 0x02 /* 16 bits */ +#define PCI_COMMAND 0x04 /* 16 bits */ +#define PCI_COMMAND_IO 0x1 /* Enable response in I/O space */ +#define PCI_COMMAND_MEMORY 0x2 /* Enable response in Memory space */ +#define PCI_COMMAND_MASTER 0x4 /* Enable bus mastering */ +#define PCI_COMMAND_SPECIAL 0x8 /* Enable response to special cycles */ +#define PCI_COMMAND_INVALIDATE 0x10 /* Use memory write and invalidate */ +#define PCI_COMMAND_VGA_PALETTE 0x20 /* Enable palette snooping */ +#define PCI_COMMAND_PARITY 0x40 /* Enable parity checking */ +#define PCI_COMMAND_WAIT 0x80 /* Enable address/data stepping */ +#define PCI_COMMAND_SERR 0x100 /* Enable SERR */ +#define PCI_COMMAND_FAST_BACK 0x200 /* Enable back-to-back writes */ +#define PCI_COMMAND_DISABLE_INTx 0x400 /* PCIE: Disable INTx interrupts */ + +#define PCI_STATUS 0x06 /* 16 bits */ +#define PCI_STATUS_INTx 0x08 /* PCIE: INTx interrupt pending */ +#define PCI_STATUS_CAP_LIST 0x10 /* Support Capability List */ +#define PCI_STATUS_66MHZ 0x20 /* Support 66 Mhz PCI 2.1 bus */ +#define PCI_STATUS_UDF 0x40 /* Support User Definable Features [obsolete] */ +#define PCI_STATUS_FAST_BACK 0x80 /* Accept fast-back to back */ +#define PCI_STATUS_PARITY 0x100 /* Detected parity error */ +#define PCI_STATUS_DEVSEL_MASK 0x600 /* DEVSEL timing */ +#define PCI_STATUS_DEVSEL_FAST 0x000 +#define PCI_STATUS_DEVSEL_MEDIUM 0x200 +#define PCI_STATUS_DEVSEL_SLOW 0x400 +#define PCI_STATUS_SIG_TARGET_ABORT 0x800 /* Set on target abort */ +#define PCI_STATUS_REC_TARGET_ABORT 0x1000 /* Master ack of " */ +#define PCI_STATUS_REC_MASTER_ABORT 0x2000 /* Set on master abort */ +#define PCI_STATUS_SIG_SYSTEM_ERROR 0x4000 /* Set when we drive SERR */ +#define PCI_STATUS_DETECTED_PARITY 0x8000 /* Set on parity error */ + +#define PCI_CLASS_REVISION 0x08 /* High 24 bits are class, low 8 + revision */ +#define PCI_REVISION_ID 0x08 /* Revision ID */ +#define PCI_CLASS_PROG 0x09 /* Reg. Level Programming Interface */ +#define PCI_CLASS_DEVICE 0x0a /* Device class */ + +#define PCI_CACHE_LINE_SIZE 0x0c /* 8 bits */ +#define PCI_LATENCY_TIMER 0x0d /* 8 bits */ +#define PCI_HEADER_TYPE 0x0e /* 8 bits */ +#define PCI_HEADER_TYPE_NORMAL 0 +#define PCI_HEADER_TYPE_BRIDGE 1 +#define PCI_HEADER_TYPE_CARDBUS 2 + +#define PCI_BIST 0x0f /* 8 bits */ +#define PCI_BIST_CODE_MASK 0x0f /* Return result */ +#define PCI_BIST_START 0x40 /* 1 to start BIST, 2 secs or less */ +#define PCI_BIST_CAPABLE 0x80 /* 1 if BIST capable */ + +/* + * Base addresses specify locations in memory or I/O space. + * Decoded size can be determined by writing a value of + * 0xffffffff to the register, and reading it back. Only + * 1 bits are decoded. + */ +#define PCI_BASE_ADDRESS_0 0x10 /* 32 bits */ +#define PCI_BASE_ADDRESS_1 0x14 /* 32 bits [htype 0,1 only] */ +#define PCI_BASE_ADDRESS_2 0x18 /* 32 bits [htype 0 only] */ +#define PCI_BASE_ADDRESS_3 0x1c /* 32 bits */ +#define PCI_BASE_ADDRESS_4 0x20 /* 32 bits */ +#define PCI_BASE_ADDRESS_5 0x24 /* 32 bits */ +#define PCI_BASE_ADDRESS_SPACE 0x01 /* 0 = memory, 1 = I/O */ +#define PCI_BASE_ADDRESS_SPACE_IO 0x01 +#define PCI_BASE_ADDRESS_SPACE_MEMORY 0x00 +#define PCI_BASE_ADDRESS_MEM_TYPE_MASK 0x06 +#define PCI_BASE_ADDRESS_MEM_TYPE_32 0x00 /* 32 bit address */ +#define PCI_BASE_ADDRESS_MEM_TYPE_1M 0x02 /* Below 1M [obsolete] */ +#define PCI_BASE_ADDRESS_MEM_TYPE_64 0x04 /* 64 bit address */ +#define PCI_BASE_ADDRESS_MEM_PREFETCH 0x08 /* prefetchable? */ +#define PCI_BASE_ADDRESS_MEM_MASK (~(pciaddr_t)0x0f) +#define PCI_BASE_ADDRESS_IO_MASK (~(pciaddr_t)0x03) +/* bit 1 is reserved if address_space = 1 */ + +/* Header type 0 (normal devices) */ +#define PCI_CARDBUS_CIS 0x28 +#define PCI_SUBSYSTEM_VENDOR_ID 0x2c +#define PCI_SUBSYSTEM_ID 0x2e +#define PCI_ROM_ADDRESS 0x30 /* Bits 31..11 are address, 10..1 reserved */ +#define PCI_ROM_ADDRESS_ENABLE 0x01 +#define PCI_ROM_ADDRESS_MASK (~(pciaddr_t)0x7ff) + +#define PCI_CAPABILITY_LIST 0x34 /* Offset of first capability list entry */ + +/* 0x35-0x3b are reserved */ +#define PCI_INTERRUPT_LINE 0x3c /* 8 bits */ +#define PCI_INTERRUPT_PIN 0x3d /* 8 bits */ +#define PCI_MIN_GNT 0x3e /* 8 bits */ +#define PCI_MAX_LAT 0x3f /* 8 bits */ + +/* Header type 1 (PCI-to-PCI bridges) */ +#define PCI_PRIMARY_BUS 0x18 /* Primary bus number */ +#define PCI_SECONDARY_BUS 0x19 /* Secondary bus number */ +#define PCI_SUBORDINATE_BUS 0x1a /* Highest bus number behind the bridge */ +#define PCI_SEC_LATENCY_TIMER 0x1b /* Latency timer for secondary interface */ +#define PCI_IO_BASE 0x1c /* I/O range behind the bridge */ +#define PCI_IO_LIMIT 0x1d +#define PCI_IO_RANGE_TYPE_MASK 0x0f /* I/O bridging type */ +#define PCI_IO_RANGE_TYPE_16 0x00 +#define PCI_IO_RANGE_TYPE_32 0x01 +#define PCI_IO_RANGE_MASK ~0x0f +#define PCI_SEC_STATUS 0x1e /* Secondary status register */ +#define PCI_MEMORY_BASE 0x20 /* Memory range behind */ +#define PCI_MEMORY_LIMIT 0x22 +#define PCI_MEMORY_RANGE_TYPE_MASK 0x0f +#define PCI_MEMORY_RANGE_MASK ~0x0f +#define PCI_PREF_MEMORY_BASE 0x24 /* Prefetchable memory range behind */ +#define PCI_PREF_MEMORY_LIMIT 0x26 +#define PCI_PREF_RANGE_TYPE_MASK 0x0f +#define PCI_PREF_RANGE_TYPE_32 0x00 +#define PCI_PREF_RANGE_TYPE_64 0x01 +#define PCI_PREF_RANGE_MASK ~0x0f +#define PCI_PREF_BASE_UPPER32 0x28 /* Upper half of prefetchable memory range */ +#define PCI_PREF_LIMIT_UPPER32 0x2c +#define PCI_IO_BASE_UPPER16 0x30 /* Upper half of I/O addresses */ +#define PCI_IO_LIMIT_UPPER16 0x32 +/* 0x34 same as for htype 0 */ +/* 0x35-0x3b is reserved */ +#define PCI_ROM_ADDRESS1 0x38 /* Same as PCI_ROM_ADDRESS, but for htype 1 */ +/* 0x3c-0x3d are same as for htype 0 */ +#define PCI_BRIDGE_CONTROL 0x3e +#define PCI_BRIDGE_CTL_PARITY 0x01 /* Enable parity detection on secondary interface */ +#define PCI_BRIDGE_CTL_SERR 0x02 /* The same for SERR forwarding */ +#define PCI_BRIDGE_CTL_NO_ISA 0x04 /* Disable bridging of ISA ports */ +#define PCI_BRIDGE_CTL_VGA 0x08 /* Forward VGA addresses */ +#define PCI_BRIDGE_CTL_MASTER_ABORT 0x20 /* Report master aborts */ +#define PCI_BRIDGE_CTL_BUS_RESET 0x40 /* Secondary bus reset */ +#define PCI_BRIDGE_CTL_FAST_BACK 0x80 /* Fast Back2Back enabled on secondary interface */ +#define PCI_BRIDGE_CTL_PRI_DISCARD_TIMER 0x100 /* PCI-X? */ +#define PCI_BRIDGE_CTL_SEC_DISCARD_TIMER 0x200 /* PCI-X? */ +#define PCI_BRIDGE_CTL_DISCARD_TIMER_STATUS 0x400 /* PCI-X? */ +#define PCI_BRIDGE_CTL_DISCARD_TIMER_SERR_EN 0x800 /* PCI-X? */ + +/* Header type 2 (CardBus bridges) */ +/* 0x14-0x15 reserved */ +#define PCI_CB_SEC_STATUS 0x16 /* Secondary status */ +#define PCI_CB_PRIMARY_BUS 0x18 /* PCI bus number */ +#define PCI_CB_CARD_BUS 0x19 /* CardBus bus number */ +#define PCI_CB_SUBORDINATE_BUS 0x1a /* Subordinate bus number */ +#define PCI_CB_LATENCY_TIMER 0x1b /* CardBus latency timer */ +#define PCI_CB_MEMORY_BASE_0 0x1c +#define PCI_CB_MEMORY_LIMIT_0 0x20 +#define PCI_CB_MEMORY_BASE_1 0x24 +#define PCI_CB_MEMORY_LIMIT_1 0x28 +#define PCI_CB_IO_BASE_0 0x2c +#define PCI_CB_IO_BASE_0_HI 0x2e +#define PCI_CB_IO_LIMIT_0 0x30 +#define PCI_CB_IO_LIMIT_0_HI 0x32 +#define PCI_CB_IO_BASE_1 0x34 +#define PCI_CB_IO_BASE_1_HI 0x36 +#define PCI_CB_IO_LIMIT_1 0x38 +#define PCI_CB_IO_LIMIT_1_HI 0x3a +#define PCI_CB_IO_RANGE_MASK ~0x03 +/* 0x3c-0x3d are same as for htype 0 */ +#define PCI_CB_BRIDGE_CONTROL 0x3e +#define PCI_CB_BRIDGE_CTL_PARITY 0x01 /* Similar to standard bridge control register */ +#define PCI_CB_BRIDGE_CTL_SERR 0x02 +#define PCI_CB_BRIDGE_CTL_ISA 0x04 +#define PCI_CB_BRIDGE_CTL_VGA 0x08 +#define PCI_CB_BRIDGE_CTL_MASTER_ABORT 0x20 +#define PCI_CB_BRIDGE_CTL_CB_RESET 0x40 /* CardBus reset */ +#define PCI_CB_BRIDGE_CTL_16BIT_INT 0x80 /* Enable interrupt for 16-bit cards */ +#define PCI_CB_BRIDGE_CTL_PREFETCH_MEM0 0x100 /* Prefetch enable for both memory regions */ +#define PCI_CB_BRIDGE_CTL_PREFETCH_MEM1 0x200 +#define PCI_CB_BRIDGE_CTL_POST_WRITES 0x400 +#define PCI_CB_SUBSYSTEM_VENDOR_ID 0x40 +#define PCI_CB_SUBSYSTEM_ID 0x42 +#define PCI_CB_LEGACY_MODE_BASE 0x44 /* 16-bit PC Card legacy mode base address (ExCa) */ +/* 0x48-0x7f reserved */ + +/* Capability lists */ + +#define PCI_CAP_LIST_ID 0 /* Capability ID */ +#define PCI_CAP_ID_PM 0x01 /* Power Management */ +#define PCI_CAP_ID_AGP 0x02 /* Accelerated Graphics Port */ +#define PCI_CAP_ID_VPD 0x03 /* Vital Product Data */ +#define PCI_CAP_ID_SLOTID 0x04 /* Slot Identification */ +#define PCI_CAP_ID_MSI 0x05 /* Message Signaled Interrupts */ +#define PCI_CAP_ID_CHSWP 0x06 /* CompactPCI HotSwap */ +#define PCI_CAP_ID_PCIX 0x07 /* PCI-X */ +#define PCI_CAP_ID_HT 0x08 /* HyperTransport */ +#define PCI_CAP_ID_VNDR 0x09 /* Vendor specific */ +#define PCI_CAP_ID_DBG 0x0A /* Debug port */ +#define PCI_CAP_ID_CCRC 0x0B /* CompactPCI Central Resource Control */ +#define PCI_CAP_ID_HOTPLUG 0x0C /* PCI hot-plug */ +#define PCI_CAP_ID_SSVID 0x0D /* Bridge subsystem vendor/device ID */ +#define PCI_CAP_ID_AGP3 0x0E /* AGP 8x */ +#define PCI_CAP_ID_SECURE 0x0F /* Secure device (?) */ +#define PCI_CAP_ID_EXP 0x10 /* PCI Express */ +#define PCI_CAP_ID_MSIX 0x11 /* MSI-X */ +#define PCI_CAP_ID_SATA 0x12 /* Serial-ATA HBA */ +#define PCI_CAP_ID_AF 0x13 /* Advanced features of PCI devices integrated in PCIe root cplx */ +#define PCI_CAP_LIST_NEXT 1 /* Next capability in the list */ +#define PCI_CAP_FLAGS 2 /* Capability defined flags (16 bits) */ +#define PCI_CAP_SIZEOF 4 + +/* Capabilities residing in the PCI Express extended configuration space */ + +#define PCI_EXT_CAP_ID_AER 0x01 /* Advanced Error Reporting */ +#define PCI_EXT_CAP_ID_VC 0x02 /* Virtual Channel */ +#define PCI_EXT_CAP_ID_DSN 0x03 /* Device Serial Number */ +#define PCI_EXT_CAP_ID_PB 0x04 /* Power Budgeting */ +#define PCI_EXT_CAP_ID_RCLINK 0x05 /* Root Complex Link Declaration */ +#define PCI_EXT_CAP_ID_RCILINK 0x06 /* Root Complex Internal Link Declaration */ +#define PCI_EXT_CAP_ID_RCECOLL 0x07 /* Root Complex Event Collector */ +#define PCI_EXT_CAP_ID_MFVC 0x08 /* Multi-Function Virtual Channel */ +#define PCI_EXT_CAP_ID_VC2 0x09 /* Virtual Channel (2nd ID) */ +#define PCI_EXT_CAP_ID_RBCB 0x0a /* Root Bridge Control Block */ +#define PCI_EXT_CAP_ID_VNDR 0x0b /* Vendor specific */ +#define PCI_EXT_CAP_ID_ACS 0x0d /* Access Controls */ +#define PCI_EXT_CAP_ID_ARI 0x0e /* Alternative Routing-ID Interpretation */ +#define PCI_EXT_CAP_ID_ATS 0x0f /* Address Translation Service */ +#define PCI_EXT_CAP_ID_SRIOV 0x10 /* Single Root I/O Virtualization */ +#define PCI_EXT_CAP_ID_TPH 0x17 /* Transaction processing hints */ +#define PCI_EXT_CAP_ID_LTR 0x18 /* Latency Tolerance Reporting */ + +/*** Definitions of capabilities ***/ + +/* Power Management Registers */ + +#define PCI_PM_CAP_VER_MASK 0x0007 /* Version (2=PM1.1) */ +#define PCI_PM_CAP_PME_CLOCK 0x0008 /* Clock required for PME generation */ +#define PCI_PM_CAP_DSI 0x0020 /* Device specific initialization required */ +#define PCI_PM_CAP_AUX_C_MASK 0x01c0 /* Maximum aux current required in D3cold */ +#define PCI_PM_CAP_D1 0x0200 /* D1 power state support */ +#define PCI_PM_CAP_D2 0x0400 /* D2 power state support */ +#define PCI_PM_CAP_PME_D0 0x0800 /* PME can be asserted from D0 */ +#define PCI_PM_CAP_PME_D1 0x1000 /* PME can be asserted from D1 */ +#define PCI_PM_CAP_PME_D2 0x2000 /* PME can be asserted from D2 */ +#define PCI_PM_CAP_PME_D3_HOT 0x4000 /* PME can be asserted from D3hot */ +#define PCI_PM_CAP_PME_D3_COLD 0x8000 /* PME can be asserted from D3cold */ +#define PCI_PM_CTRL 4 /* PM control and status register */ +#define PCI_PM_CTRL_STATE_MASK 0x0003 /* Current power state (D0 to D3) */ +#define PCI_PM_CTRL_NO_SOFT_RST 0x0008 /* No Soft Reset from D3hot to D0 */ +#define PCI_PM_CTRL_PME_ENABLE 0x0100 /* PME pin enable */ +#define PCI_PM_CTRL_DATA_SEL_MASK 0x1e00 /* PM table data index */ +#define PCI_PM_CTRL_DATA_SCALE_MASK 0x6000 /* PM table data scaling factor */ +#define PCI_PM_CTRL_PME_STATUS 0x8000 /* PME pin status */ +#define PCI_PM_PPB_EXTENSIONS 6 /* PPB support extensions */ +#define PCI_PM_PPB_B2_B3 0x40 /* If bridge enters D3hot, bus enters: 0=B3, 1=B2 */ +#define PCI_PM_BPCC_ENABLE 0x80 /* Secondary bus is power managed */ +#define PCI_PM_DATA_REGISTER 7 /* PM table contents read here */ +#define PCI_PM_SIZEOF 8 + +/* AGP registers */ + +#define PCI_AGP_VERSION 2 /* BCD version number */ +#define PCI_AGP_RFU 3 /* Rest of capability flags */ +#define PCI_AGP_STATUS 4 /* Status register */ +#define PCI_AGP_STATUS_RQ_MASK 0xff000000 /* Maximum number of requests - 1 */ +#define PCI_AGP_STATUS_ISOCH 0x10000 /* Isochronous transactions supported */ +#define PCI_AGP_STATUS_ARQSZ_MASK 0xe000 /* log2(optimum async req size in bytes) - 4 */ +#define PCI_AGP_STATUS_CAL_MASK 0x1c00 /* Calibration cycle timing */ +#define PCI_AGP_STATUS_SBA 0x0200 /* Sideband addressing supported */ +#define PCI_AGP_STATUS_ITA_COH 0x0100 /* In-aperture accesses always coherent */ +#define PCI_AGP_STATUS_GART64 0x0080 /* 64-bit GART entries supported */ +#define PCI_AGP_STATUS_HTRANS 0x0040 /* If 0, core logic can xlate host CPU accesses thru aperture */ +#define PCI_AGP_STATUS_64BIT 0x0020 /* 64-bit addressing cycles supported */ +#define PCI_AGP_STATUS_FW 0x0010 /* Fast write transfers supported */ +#define PCI_AGP_STATUS_AGP3 0x0008 /* AGP3 mode supported */ +#define PCI_AGP_STATUS_RATE4 0x0004 /* 4x transfer rate supported (RFU in AGP3 mode) */ +#define PCI_AGP_STATUS_RATE2 0x0002 /* 2x transfer rate supported (8x in AGP3 mode) */ +#define PCI_AGP_STATUS_RATE1 0x0001 /* 1x transfer rate supported (4x in AGP3 mode) */ +#define PCI_AGP_COMMAND 8 /* Control register */ +#define PCI_AGP_COMMAND_RQ_MASK 0xff000000 /* Master: Maximum number of requests */ +#define PCI_AGP_COMMAND_ARQSZ_MASK 0xe000 /* log2(optimum async req size in bytes) - 4 */ +#define PCI_AGP_COMMAND_CAL_MASK 0x1c00 /* Calibration cycle timing */ +#define PCI_AGP_COMMAND_SBA 0x0200 /* Sideband addressing enabled */ +#define PCI_AGP_COMMAND_AGP 0x0100 /* Allow processing of AGP transactions */ +#define PCI_AGP_COMMAND_GART64 0x0080 /* 64-bit GART entries enabled */ +#define PCI_AGP_COMMAND_64BIT 0x0020 /* Allow generation of 64-bit addr cycles */ +#define PCI_AGP_COMMAND_FW 0x0010 /* Enable FW transfers */ +#define PCI_AGP_COMMAND_RATE4 0x0004 /* Use 4x rate (RFU in AGP3 mode) */ +#define PCI_AGP_COMMAND_RATE2 0x0002 /* Use 2x rate (8x in AGP3 mode) */ +#define PCI_AGP_COMMAND_RATE1 0x0001 /* Use 1x rate (4x in AGP3 mode) */ +#define PCI_AGP_SIZEOF 12 + +/* Vital Product Data */ + +#define PCI_VPD_ADDR 2 /* Address to access (15 bits!) */ +#define PCI_VPD_ADDR_MASK 0x7fff /* Address mask */ +#define PCI_VPD_ADDR_F 0x8000 /* Write 0, 1 indicates completion */ +#define PCI_VPD_DATA 4 /* 32-bits of data returned here */ + +/* Slot Identification */ + +#define PCI_SID_ESR 2 /* Expansion Slot Register */ +#define PCI_SID_ESR_NSLOTS 0x1f /* Number of expansion slots available */ +#define PCI_SID_ESR_FIC 0x20 /* First In Chassis Flag */ +#define PCI_SID_CHASSIS_NR 3 /* Chassis Number */ + +/* Message Signaled Interrupts registers */ + +#define PCI_MSI_FLAGS 2 /* Various flags */ +#define PCI_MSI_FLAGS_MASK_BIT 0x100 /* interrupt masking & reporting supported */ +#define PCI_MSI_FLAGS_64BIT 0x080 /* 64-bit addresses allowed */ +#define PCI_MSI_FLAGS_QSIZE 0x070 /* Message queue size configured */ +#define PCI_MSI_FLAGS_QMASK 0x00e /* Maximum queue size available */ +#define PCI_MSI_FLAGS_ENABLE 0x001 /* MSI feature enabled */ +#define PCI_MSI_RFU 3 /* Rest of capability flags */ +#define PCI_MSI_ADDRESS_LO 4 /* Lower 32 bits */ +#define PCI_MSI_ADDRESS_HI 8 /* Upper 32 bits (if PCI_MSI_FLAGS_64BIT set) */ +#define PCI_MSI_DATA_32 8 /* 16 bits of data for 32-bit devices */ +#define PCI_MSI_DATA_64 12 /* 16 bits of data for 64-bit devices */ +#define PCI_MSI_MASK_BIT_32 12 /* per-vector masking for 32-bit devices */ +#define PCI_MSI_MASK_BIT_64 16 /* per-vector masking for 64-bit devices */ +#define PCI_MSI_PENDING_32 16 /* per-vector interrupt pending for 32-bit devices */ +#define PCI_MSI_PENDING_64 20 /* per-vector interrupt pending for 64-bit devices */ + +/* PCI-X */ +#define PCI_PCIX_COMMAND 2 /* Command register offset */ +#define PCI_PCIX_COMMAND_DPERE 0x0001 /* Data Parity Error Recover Enable */ +#define PCI_PCIX_COMMAND_ERO 0x0002 /* Enable Relaxed Ordering */ +#define PCI_PCIX_COMMAND_MAX_MEM_READ_BYTE_COUNT 0x000c /* Maximum Memory Read Byte Count */ +#define PCI_PCIX_COMMAND_MAX_OUTSTANDING_SPLIT_TRANS 0x0070 +#define PCI_PCIX_COMMAND_RESERVED 0xf80 +#define PCI_PCIX_STATUS 4 /* Status register offset */ +#define PCI_PCIX_STATUS_FUNCTION 0x00000007 +#define PCI_PCIX_STATUS_DEVICE 0x000000f8 +#define PCI_PCIX_STATUS_BUS 0x0000ff00 +#define PCI_PCIX_STATUS_64BIT 0x00010000 +#define PCI_PCIX_STATUS_133MHZ 0x00020000 +#define PCI_PCIX_STATUS_SC_DISCARDED 0x00040000 /* Split Completion Discarded */ +#define PCI_PCIX_STATUS_UNEXPECTED_SC 0x00080000 /* Unexpected Split Completion */ +#define PCI_PCIX_STATUS_DEVICE_COMPLEXITY 0x00100000 /* 0 = simple device, 1 = bridge device */ +#define PCI_PCIX_STATUS_DESIGNED_MAX_MEM_READ_BYTE_COUNT 0x00600000 /* 0 = 512 bytes, 1 = 1024, 2 = 2048, 3 = 4096 */ +#define PCI_PCIX_STATUS_DESIGNED_MAX_OUTSTANDING_SPLIT_TRANS 0x03800000 +#define PCI_PCIX_STATUS_DESIGNED_MAX_CUMULATIVE_READ_SIZE 0x1c000000 +#define PCI_PCIX_STATUS_RCVD_SC_ERR_MESS 0x20000000 /* Received Split Completion Error Message */ +#define PCI_PCIX_STATUS_266MHZ 0x40000000 /* 266 MHz capable */ +#define PCI_PCIX_STATUS_533MHZ 0x80000000 /* 533 MHz capable */ +#define PCI_PCIX_SIZEOF 4 + +/* PCI-X Bridges */ +#define PCI_PCIX_BRIDGE_SEC_STATUS 2 /* Secondary bus status register offset */ +#define PCI_PCIX_BRIDGE_SEC_STATUS_64BIT 0x0001 +#define PCI_PCIX_BRIDGE_SEC_STATUS_133MHZ 0x0002 +#define PCI_PCIX_BRIDGE_SEC_STATUS_SC_DISCARDED 0x0004 /* Split Completion Discarded on secondary bus */ +#define PCI_PCIX_BRIDGE_SEC_STATUS_UNEXPECTED_SC 0x0008 /* Unexpected Split Completion on secondary bus */ +#define PCI_PCIX_BRIDGE_SEC_STATUS_SC_OVERRUN 0x0010 /* Split Completion Overrun on secondary bus */ +#define PCI_PCIX_BRIDGE_SEC_STATUS_SPLIT_REQUEST_DELAYED 0x0020 +#define PCI_PCIX_BRIDGE_SEC_STATUS_CLOCK_FREQ 0x01c0 +#define PCI_PCIX_BRIDGE_SEC_STATUS_RESERVED 0xfe00 +#define PCI_PCIX_BRIDGE_STATUS 4 /* Primary bus status register offset */ +#define PCI_PCIX_BRIDGE_STATUS_FUNCTION 0x00000007 +#define PCI_PCIX_BRIDGE_STATUS_DEVICE 0x000000f8 +#define PCI_PCIX_BRIDGE_STATUS_BUS 0x0000ff00 +#define PCI_PCIX_BRIDGE_STATUS_64BIT 0x00010000 +#define PCI_PCIX_BRIDGE_STATUS_133MHZ 0x00020000 +#define PCI_PCIX_BRIDGE_STATUS_SC_DISCARDED 0x00040000 /* Split Completion Discarded */ +#define PCI_PCIX_BRIDGE_STATUS_UNEXPECTED_SC 0x00080000 /* Unexpected Split Completion */ +#define PCI_PCIX_BRIDGE_STATUS_SC_OVERRUN 0x00100000 /* Split Completion Overrun */ +#define PCI_PCIX_BRIDGE_STATUS_SPLIT_REQUEST_DELAYED 0x00200000 +#define PCI_PCIX_BRIDGE_STATUS_RESERVED 0xffc00000 +#define PCI_PCIX_BRIDGE_UPSTREAM_SPLIT_TRANS_CTRL 8 /* Upstream Split Transaction Register offset */ +#define PCI_PCIX_BRIDGE_DOWNSTREAM_SPLIT_TRANS_CTRL 12 /* Downstream Split Transaction Register offset */ +#define PCI_PCIX_BRIDGE_STR_CAPACITY 0x0000ffff +#define PCI_PCIX_BRIDGE_STR_COMMITMENT_LIMIT 0xffff0000 +#define PCI_PCIX_BRIDGE_SIZEOF 12 + +/* HyperTransport (as of spec rev. 2.00) */ +#define PCI_HT_CMD 2 /* Command Register */ +#define PCI_HT_CMD_TYP_HI 0xe000 /* Capability Type high part */ +#define PCI_HT_CMD_TYP_HI_PRI 0x0000 /* Slave or Primary Interface */ +#define PCI_HT_CMD_TYP_HI_SEC 0x2000 /* Host or Secondary Interface */ +#define PCI_HT_CMD_TYP 0xf800 /* Capability Type */ +#define PCI_HT_CMD_TYP_SW 0x4000 /* Switch */ +#define PCI_HT_CMD_TYP_IDC 0x8000 /* Interrupt Discovery and Configuration */ +#define PCI_HT_CMD_TYP_RID 0x8800 /* Revision ID */ +#define PCI_HT_CMD_TYP_UIDC 0x9000 /* UnitID Clumping */ +#define PCI_HT_CMD_TYP_ECSA 0x9800 /* Extended Configuration Space Access */ +#define PCI_HT_CMD_TYP_AM 0xa000 /* Address Mapping */ +#define PCI_HT_CMD_TYP_MSIM 0xa800 /* MSI Mapping */ +#define PCI_HT_CMD_TYP_DR 0xb000 /* DirectRoute */ +#define PCI_HT_CMD_TYP_VCS 0xb800 /* VCSet */ +#define PCI_HT_CMD_TYP_RM 0xc000 /* Retry Mode */ +#define PCI_HT_CMD_TYP_X86 0xc800 /* X86 (reserved) */ + + /* Link Control Register */ +#define PCI_HT_LCTR_CFLE 0x0002 /* CRC Flood Enable */ +#define PCI_HT_LCTR_CST 0x0004 /* CRC Start Test */ +#define PCI_HT_LCTR_CFE 0x0008 /* CRC Force Error */ +#define PCI_HT_LCTR_LKFAIL 0x0010 /* Link Failure */ +#define PCI_HT_LCTR_INIT 0x0020 /* Initialization Complete */ +#define PCI_HT_LCTR_EOC 0x0040 /* End of Chain */ +#define PCI_HT_LCTR_TXO 0x0080 /* Transmitter Off */ +#define PCI_HT_LCTR_CRCERR 0x0f00 /* CRC Error */ +#define PCI_HT_LCTR_ISOCEN 0x1000 /* Isochronous Flow Control Enable */ +#define PCI_HT_LCTR_LSEN 0x2000 /* LDTSTOP# Tristate Enable */ +#define PCI_HT_LCTR_EXTCTL 0x4000 /* Extended CTL Time */ +#define PCI_HT_LCTR_64B 0x8000 /* 64-bit Addressing Enable */ + + /* Link Configuration Register */ +#define PCI_HT_LCNF_MLWI 0x0007 /* Max Link Width In */ +#define PCI_HT_LCNF_LW_8B 0x0 /* Link Width 8 bits */ +#define PCI_HT_LCNF_LW_16B 0x1 /* Link Width 16 bits */ +#define PCI_HT_LCNF_LW_32B 0x3 /* Link Width 32 bits */ +#define PCI_HT_LCNF_LW_2B 0x4 /* Link Width 2 bits */ +#define PCI_HT_LCNF_LW_4B 0x5 /* Link Width 4 bits */ +#define PCI_HT_LCNF_LW_NC 0x7 /* Link physically not connected */ +#define PCI_HT_LCNF_DFI 0x0008 /* Doubleword Flow Control In */ +#define PCI_HT_LCNF_MLWO 0x0070 /* Max Link Width Out */ +#define PCI_HT_LCNF_DFO 0x0080 /* Doubleword Flow Control Out */ +#define PCI_HT_LCNF_LWI 0x0700 /* Link Width In */ +#define PCI_HT_LCNF_DFIE 0x0800 /* Doubleword Flow Control In Enable */ +#define PCI_HT_LCNF_LWO 0x7000 /* Link Width Out */ +#define PCI_HT_LCNF_DFOE 0x8000 /* Doubleword Flow Control Out Enable */ + + /* Revision ID Register */ +#define PCI_HT_RID_MIN 0x1f /* Minor Revision */ +#define PCI_HT_RID_MAJ 0xe0 /* Major Revision */ + + /* Link Frequency/Error Register */ +#define PCI_HT_LFRER_FREQ 0x0f /* Transmitter Clock Frequency */ +#define PCI_HT_LFRER_200 0x00 /* 200MHz */ +#define PCI_HT_LFRER_300 0x01 /* 300MHz */ +#define PCI_HT_LFRER_400 0x02 /* 400MHz */ +#define PCI_HT_LFRER_500 0x03 /* 500MHz */ +#define PCI_HT_LFRER_600 0x04 /* 600MHz */ +#define PCI_HT_LFRER_800 0x05 /* 800MHz */ +#define PCI_HT_LFRER_1000 0x06 /* 1.0GHz */ +#define PCI_HT_LFRER_1200 0x07 /* 1.2GHz */ +#define PCI_HT_LFRER_1400 0x08 /* 1.4GHz */ +#define PCI_HT_LFRER_1600 0x09 /* 1.6GHz */ +#define PCI_HT_LFRER_VEND 0x0f /* Vendor-Specific */ +#define PCI_HT_LFRER_ERR 0xf0 /* Link Error */ +#define PCI_HT_LFRER_PROT 0x10 /* Protocol Error */ +#define PCI_HT_LFRER_OV 0x20 /* Overflow Error */ +#define PCI_HT_LFRER_EOC 0x40 /* End of Chain Error */ +#define PCI_HT_LFRER_CTLT 0x80 /* CTL Timeout */ + + /* Link Frequency Capability Register */ +#define PCI_HT_LFCAP_200 0x0001 /* 200MHz */ +#define PCI_HT_LFCAP_300 0x0002 /* 300MHz */ +#define PCI_HT_LFCAP_400 0x0004 /* 400MHz */ +#define PCI_HT_LFCAP_500 0x0008 /* 500MHz */ +#define PCI_HT_LFCAP_600 0x0010 /* 600MHz */ +#define PCI_HT_LFCAP_800 0x0020 /* 800MHz */ +#define PCI_HT_LFCAP_1000 0x0040 /* 1.0GHz */ +#define PCI_HT_LFCAP_1200 0x0080 /* 1.2GHz */ +#define PCI_HT_LFCAP_1400 0x0100 /* 1.4GHz */ +#define PCI_HT_LFCAP_1600 0x0200 /* 1.6GHz */ +#define PCI_HT_LFCAP_VEND 0x8000 /* Vendor-Specific */ + + /* Feature Register */ +#define PCI_HT_FTR_ISOCFC 0x0001 /* Isochronous Flow Control Mode */ +#define PCI_HT_FTR_LDTSTOP 0x0002 /* LDTSTOP# Supported */ +#define PCI_HT_FTR_CRCTM 0x0004 /* CRC Test Mode */ +#define PCI_HT_FTR_ECTLT 0x0008 /* Extended CTL Time Required */ +#define PCI_HT_FTR_64BA 0x0010 /* 64-bit Addressing */ +#define PCI_HT_FTR_UIDRD 0x0020 /* UnitID Reorder Disable */ + + /* Error Handling Register */ +#define PCI_HT_EH_PFLE 0x0001 /* Protocol Error Flood Enable */ +#define PCI_HT_EH_OFLE 0x0002 /* Overflow Error Flood Enable */ +#define PCI_HT_EH_PFE 0x0004 /* Protocol Error Fatal Enable */ +#define PCI_HT_EH_OFE 0x0008 /* Overflow Error Fatal Enable */ +#define PCI_HT_EH_EOCFE 0x0010 /* End of Chain Error Fatal Enable */ +#define PCI_HT_EH_RFE 0x0020 /* Response Error Fatal Enable */ +#define PCI_HT_EH_CRCFE 0x0040 /* CRC Error Fatal Enable */ +#define PCI_HT_EH_SERRFE 0x0080 /* System Error Fatal Enable (B */ +#define PCI_HT_EH_CF 0x0100 /* Chain Fail */ +#define PCI_HT_EH_RE 0x0200 /* Response Error */ +#define PCI_HT_EH_PNFE 0x0400 /* Protocol Error Nonfatal Enable */ +#define PCI_HT_EH_ONFE 0x0800 /* Overflow Error Nonfatal Enable */ +#define PCI_HT_EH_EOCNFE 0x1000 /* End of Chain Error Nonfatal Enable */ +#define PCI_HT_EH_RNFE 0x2000 /* Response Error Nonfatal Enable */ +#define PCI_HT_EH_CRCNFE 0x4000 /* CRC Error Nonfatal Enable */ +#define PCI_HT_EH_SERRNFE 0x8000 /* System Error Nonfatal Enable */ + +/* HyperTransport: Slave or Primary Interface */ +#define PCI_HT_PRI_CMD 2 /* Command Register */ +#define PCI_HT_PRI_CMD_BUID 0x001f /* Base UnitID */ +#define PCI_HT_PRI_CMD_UC 0x03e0 /* Unit Count */ +#define PCI_HT_PRI_CMD_MH 0x0400 /* Master Host */ +#define PCI_HT_PRI_CMD_DD 0x0800 /* Default Direction */ +#define PCI_HT_PRI_CMD_DUL 0x1000 /* Drop on Uninitialized Link */ + +#define PCI_HT_PRI_LCTR0 4 /* Link Control 0 Register */ +#define PCI_HT_PRI_LCNF0 6 /* Link Config 0 Register */ +#define PCI_HT_PRI_LCTR1 8 /* Link Control 1 Register */ +#define PCI_HT_PRI_LCNF1 10 /* Link Config 1 Register */ +#define PCI_HT_PRI_RID 12 /* Revision ID Register */ +#define PCI_HT_PRI_LFRER0 13 /* Link Frequency/Error 0 Register */ +#define PCI_HT_PRI_LFCAP0 14 /* Link Frequency Capability 0 Register */ +#define PCI_HT_PRI_FTR 16 /* Feature Register */ +#define PCI_HT_PRI_LFRER1 17 /* Link Frequency/Error 1 Register */ +#define PCI_HT_PRI_LFCAP1 18 /* Link Frequency Capability 1 Register */ +#define PCI_HT_PRI_ES 20 /* Enumeration Scratchpad Register */ +#define PCI_HT_PRI_EH 22 /* Error Handling Register */ +#define PCI_HT_PRI_MBU 24 /* Memory Base Upper Register */ +#define PCI_HT_PRI_MLU 25 /* Memory Limit Upper Register */ +#define PCI_HT_PRI_BN 26 /* Bus Number Register */ +#define PCI_HT_PRI_SIZEOF 28 + +/* HyperTransport: Host or Secondary Interface */ +#define PCI_HT_SEC_CMD 2 /* Command Register */ +#define PCI_HT_SEC_CMD_WR 0x0001 /* Warm Reset */ +#define PCI_HT_SEC_CMD_DE 0x0002 /* Double-Ended */ +#define PCI_HT_SEC_CMD_DN 0x0076 /* Device Number */ +#define PCI_HT_SEC_CMD_CS 0x0080 /* Chain Side */ +#define PCI_HT_SEC_CMD_HH 0x0100 /* Host Hide */ +#define PCI_HT_SEC_CMD_AS 0x0400 /* Act as Slave */ +#define PCI_HT_SEC_CMD_HIECE 0x0800 /* Host Inbound End of Chain Error */ +#define PCI_HT_SEC_CMD_DUL 0x1000 /* Drop on Uninitialized Link */ + +#define PCI_HT_SEC_LCTR 4 /* Link Control Register */ +#define PCI_HT_SEC_LCNF 6 /* Link Config Register */ +#define PCI_HT_SEC_RID 8 /* Revision ID Register */ +#define PCI_HT_SEC_LFRER 9 /* Link Frequency/Error Register */ +#define PCI_HT_SEC_LFCAP 10 /* Link Frequency Capability Register */ +#define PCI_HT_SEC_FTR 12 /* Feature Register */ +#define PCI_HT_SEC_FTR_EXTRS 0x0100 /* Extended Register Set */ +#define PCI_HT_SEC_FTR_UCNFE 0x0200 /* Upstream Configuration Enable */ +#define PCI_HT_SEC_ES 16 /* Enumeration Scratchpad Register */ +#define PCI_HT_SEC_EH 18 /* Error Handling Register */ +#define PCI_HT_SEC_MBU 20 /* Memory Base Upper Register */ +#define PCI_HT_SEC_MLU 21 /* Memory Limit Upper Register */ +#define PCI_HT_SEC_SIZEOF 24 + +/* HyperTransport: Switch */ +#define PCI_HT_SW_CMD 2 /* Switch Command Register */ +#define PCI_HT_SW_CMD_VIBERR 0x0080 /* VIB Error */ +#define PCI_HT_SW_CMD_VIBFL 0x0100 /* VIB Flood */ +#define PCI_HT_SW_CMD_VIBFT 0x0200 /* VIB Fatal */ +#define PCI_HT_SW_CMD_VIBNFT 0x0400 /* VIB Nonfatal */ +#define PCI_HT_SW_PMASK 4 /* Partition Mask Register */ +#define PCI_HT_SW_SWINF 8 /* Switch Info Register */ +#define PCI_HT_SW_SWINF_DP 0x0000001f /* Default Port */ +#define PCI_HT_SW_SWINF_EN 0x00000020 /* Enable Decode */ +#define PCI_HT_SW_SWINF_CR 0x00000040 /* Cold Reset */ +#define PCI_HT_SW_SWINF_PCIDX 0x00000f00 /* Performance Counter Index */ +#define PCI_HT_SW_SWINF_BLRIDX 0x0003f000 /* Base/Limit Range Index */ +#define PCI_HT_SW_SWINF_SBIDX 0x00002000 /* Secondary Base Range Index */ +#define PCI_HT_SW_SWINF_HP 0x00040000 /* Hot Plug */ +#define PCI_HT_SW_SWINF_HIDE 0x00080000 /* Hide Port */ +#define PCI_HT_SW_PCD 12 /* Performance Counter Data Register */ +#define PCI_HT_SW_BLRD 16 /* Base/Limit Range Data Register */ +#define PCI_HT_SW_SBD 20 /* Secondary Base Data Register */ +#define PCI_HT_SW_SIZEOF 24 + + /* Counter indices */ +#define PCI_HT_SW_PC_PCR 0x0 /* Posted Command Receive */ +#define PCI_HT_SW_PC_NPCR 0x1 /* Nonposted Command Receive */ +#define PCI_HT_SW_PC_RCR 0x2 /* Response Command Receive */ +#define PCI_HT_SW_PC_PDWR 0x3 /* Posted DW Receive */ +#define PCI_HT_SW_PC_NPDWR 0x4 /* Nonposted DW Receive */ +#define PCI_HT_SW_PC_RDWR 0x5 /* Response DW Receive */ +#define PCI_HT_SW_PC_PCT 0x6 /* Posted Command Transmit */ +#define PCI_HT_SW_PC_NPCT 0x7 /* Nonposted Command Transmit */ +#define PCI_HT_SW_PC_RCT 0x8 /* Response Command Transmit */ +#define PCI_HT_SW_PC_PDWT 0x9 /* Posted DW Transmit */ +#define PCI_HT_SW_PC_NPDWT 0xa /* Nonposted DW Transmit */ +#define PCI_HT_SW_PC_RDWT 0xb /* Response DW Transmit */ + + /* Base/Limit Range indices */ +#define PCI_HT_SW_BLR_BASE0_LO 0x0 /* Base 0[31:1], Enable */ +#define PCI_HT_SW_BLR_BASE0_HI 0x1 /* Base 0 Upper */ +#define PCI_HT_SW_BLR_LIM0_LO 0x2 /* Limit 0 Lower */ +#define PCI_HT_SW_BLR_LIM0_HI 0x3 /* Limit 0 Upper */ + + /* Secondary Base indices */ +#define PCI_HT_SW_SB_LO 0x0 /* Secondary Base[31:1], Enable */ +#define PCI_HT_SW_S0_HI 0x1 /* Secondary Base Upper */ + +/* HyperTransport: Interrupt Discovery and Configuration */ +#define PCI_HT_IDC_IDX 2 /* Index Register */ +#define PCI_HT_IDC_DATA 4 /* Data Register */ +#define PCI_HT_IDC_SIZEOF 8 + + /* Register indices */ +#define PCI_HT_IDC_IDX_LINT 0x01 /* Last Interrupt Register */ +#define PCI_HT_IDC_LINT 0x00ff0000 /* Last interrupt definition */ +#define PCI_HT_IDC_IDX_IDR 0x10 /* Interrupt Definition Registers */ + /* Low part (at index) */ +#define PCI_HT_IDC_IDR_MASK 0x10000001 /* Mask */ +#define PCI_HT_IDC_IDR_POL 0x10000002 /* Polarity */ +#define PCI_HT_IDC_IDR_II_2 0x1000001c /* IntrInfo[4:2]: Message Type */ +#define PCI_HT_IDC_IDR_II_5 0x10000020 /* IntrInfo[5]: Request EOI */ +#define PCI_HT_IDC_IDR_II_6 0x00ffffc0 /* IntrInfo[23:6] */ +#define PCI_HT_IDC_IDR_II_24 0xff000000 /* IntrInfo[31:24] */ + /* High part (at index + 1) */ +#define PCI_HT_IDC_IDR_II_32 0x00ffffff /* IntrInfo[55:32] */ +#define PCI_HT_IDC_IDR_PASSPW 0x40000000 /* PassPW setting for messages */ +#define PCI_HT_IDC_IDR_WEOI 0x80000000 /* Waiting for EOI */ + +/* HyperTransport: Revision ID */ +#define PCI_HT_RID_RID 2 /* Revision Register */ +#define PCI_HT_RID_SIZEOF 4 + +/* HyperTransport: UnitID Clumping */ +#define PCI_HT_UIDC_CS 4 /* Clumping Support Register */ +#define PCI_HT_UIDC_CE 8 /* Clumping Enable Register */ +#define PCI_HT_UIDC_SIZEOF 12 + +/* HyperTransport: Extended Configuration Space Access */ +#define PCI_HT_ECSA_ADDR 4 /* Configuration Address Register */ +#define PCI_HT_ECSA_ADDR_REG 0x00000ffc /* Register */ +#define PCI_HT_ECSA_ADDR_FUN 0x00007000 /* Function */ +#define PCI_HT_ECSA_ADDR_DEV 0x000f1000 /* Device */ +#define PCI_HT_ECSA_ADDR_BUS 0x0ff00000 /* Bus Number */ +#define PCI_HT_ECSA_ADDR_TYPE 0x10000000 /* Access Type */ +#define PCI_HT_ECSA_DATA 8 /* Configuration Data Register */ +#define PCI_HT_ECSA_SIZEOF 12 + +/* HyperTransport: Address Mapping */ +#define PCI_HT_AM_CMD 2 /* Command Register */ +#define PCI_HT_AM_CMD_NDMA 0x000f /* Number of DMA Mappings */ +#define PCI_HT_AM_CMD_IOSIZ 0x01f0 /* I/O Size */ +#define PCI_HT_AM_CMD_MT 0x0600 /* Map Type */ +#define PCI_HT_AM_CMD_MT_40B 0x0000 /* 40-bit */ +#define PCI_HT_AM_CMD_MT_64B 0x0200 /* 64-bit */ + + /* Window Control Register bits */ +#define PCI_HT_AM_SBW_CTR_COMP 0x1 /* Compat */ +#define PCI_HT_AM_SBW_CTR_NCOH 0x2 /* NonCoherent */ +#define PCI_HT_AM_SBW_CTR_ISOC 0x4 /* Isochronous */ +#define PCI_HT_AM_SBW_CTR_EN 0x8 /* Enable */ + +/* HyperTransport: 40-bit Address Mapping */ +#define PCI_HT_AM40_SBNPW 4 /* Secondary Bus Non-Prefetchable Window Register */ +#define PCI_HT_AM40_SBW_BASE 0x000fffff /* Window Base */ +#define PCI_HT_AM40_SBW_CTR 0xf0000000 /* Window Control */ +#define PCI_HT_AM40_SBPW 8 /* Secondary Bus Prefetchable Window Register */ +#define PCI_HT_AM40_DMA_PBASE0 12 /* DMA Window Primary Base 0 Register */ +#define PCI_HT_AM40_DMA_CTR0 15 /* DMA Window Control 0 Register */ +#define PCI_HT_AM40_DMA_CTR_CTR 0xf0 /* Window Control */ +#define PCI_HT_AM40_DMA_SLIM0 16 /* DMA Window Secondary Limit 0 Register */ +#define PCI_HT_AM40_DMA_SBASE0 18 /* DMA Window Secondary Base 0 Register */ +#define PCI_HT_AM40_SIZEOF 12 /* size is variable: 12 + 8 * NDMA */ + +/* HyperTransport: 64-bit Address Mapping */ +#define PCI_HT_AM64_IDX 4 /* Index Register */ +#define PCI_HT_AM64_DATA_LO 8 /* Data Lower Register */ +#define PCI_HT_AM64_DATA_HI 12 /* Data Upper Register */ +#define PCI_HT_AM64_SIZEOF 16 + + /* Register indices */ +#define PCI_HT_AM64_IDX_SBNPW 0x00 /* Secondary Bus Non-Prefetchable Window Register */ +#define PCI_HT_AM64_W_BASE_LO 0xfff00000 /* Window Base Lower */ +#define PCI_HT_AM64_W_CTR 0x0000000f /* Window Control */ +#define PCI_HT_AM64_IDX_SBPW 0x01 /* Secondary Bus Prefetchable Window Register */ +#define PCI_HT_AM64_IDX_PBNPW 0x02 /* Primary Bus Non-Prefetchable Window Register */ +#define PCI_HT_AM64_IDX_DMAPB0 0x04 /* DMA Window Primary Base 0 Register */ +#define PCI_HT_AM64_IDX_DMASB0 0x05 /* DMA Window Secondary Base 0 Register */ +#define PCI_HT_AM64_IDX_DMASL0 0x06 /* DMA Window Secondary Limit 0 Register */ + +/* HyperTransport: MSI Mapping */ +#define PCI_HT_MSIM_CMD 2 /* Command Register */ +#define PCI_HT_MSIM_CMD_EN 0x0001 /* Mapping Active */ +#define PCI_HT_MSIM_CMD_FIXD 0x0002 /* MSI Mapping Address Fixed */ +#define PCI_HT_MSIM_ADDR_LO 4 /* MSI Mapping Address Lower Register */ +#define PCI_HT_MSIM_ADDR_HI 8 /* MSI Mapping Address Upper Register */ +#define PCI_HT_MSIM_SIZEOF 12 + +/* HyperTransport: DirectRoute */ +#define PCI_HT_DR_CMD 2 /* Command Register */ +#define PCI_HT_DR_CMD_NDRS 0x000f /* Number of DirectRoute Spaces */ +#define PCI_HT_DR_CMD_IDX 0x01f0 /* Index */ +#define PCI_HT_DR_EN 4 /* Enable Vector Register */ +#define PCI_HT_DR_DATA 8 /* Data Register */ +#define PCI_HT_DR_SIZEOF 12 + + /* Register indices */ +#define PCI_HT_DR_IDX_BASE_LO 0x00 /* DirectRoute Base Lower Register */ +#define PCI_HT_DR_OTNRD 0x00000001 /* Opposite to Normal Request Direction */ +#define PCI_HT_DR_BL_LO 0xffffff00 /* Base/Limit Lower */ +#define PCI_HT_DR_IDX_BASE_HI 0x01 /* DirectRoute Base Upper Register */ +#define PCI_HT_DR_IDX_LIMIT_LO 0x02 /* DirectRoute Limit Lower Register */ +#define PCI_HT_DR_IDX_LIMIT_HI 0x03 /* DirectRoute Limit Upper Register */ + +/* HyperTransport: VCSet */ +#define PCI_HT_VCS_SUP 4 /* VCSets Supported Register */ +#define PCI_HT_VCS_L1EN 5 /* Link 1 VCSets Enabled Register */ +#define PCI_HT_VCS_L0EN 6 /* Link 0 VCSets Enabled Register */ +#define PCI_HT_VCS_SBD 8 /* Stream Bucket Depth Register */ +#define PCI_HT_VCS_SINT 9 /* Stream Interval Register */ +#define PCI_HT_VCS_SSUP 10 /* Number of Streaming VCs Supported Register */ +#define PCI_HT_VCS_SSUP_0 0x00 /* Streaming VC 0 */ +#define PCI_HT_VCS_SSUP_3 0x01 /* Streaming VCs 0-3 */ +#define PCI_HT_VCS_SSUP_15 0x02 /* Streaming VCs 0-15 */ +#define PCI_HT_VCS_NFCBD 12 /* Non-FC Bucket Depth Register */ +#define PCI_HT_VCS_NFCINT 13 /* Non-FC Bucket Interval Register */ +#define PCI_HT_VCS_SIZEOF 16 + +/* HyperTransport: Retry Mode */ +#define PCI_HT_RM_CTR0 4 /* Control 0 Register */ +#define PCI_HT_RM_CTR_LRETEN 0x01 /* Link Retry Enable */ +#define PCI_HT_RM_CTR_FSER 0x02 /* Force Single Error */ +#define PCI_HT_RM_CTR_ROLNEN 0x04 /* Rollover Nonfatal Enable */ +#define PCI_HT_RM_CTR_FSS 0x08 /* Force Single Stomp */ +#define PCI_HT_RM_CTR_RETNEN 0x10 /* Retry Nonfatal Enable */ +#define PCI_HT_RM_CTR_RETFEN 0x20 /* Retry Fatal Enable */ +#define PCI_HT_RM_CTR_AA 0xc0 /* Allowed Attempts */ +#define PCI_HT_RM_STS0 5 /* Status 0 Register */ +#define PCI_HT_RM_STS_RETSNT 0x01 /* Retry Sent */ +#define PCI_HT_RM_STS_CNTROL 0x02 /* Count Rollover */ +#define PCI_HT_RM_STS_SRCV 0x04 /* Stomp Received */ +#define PCI_HT_RM_CTR1 6 /* Control 1 Register */ +#define PCI_HT_RM_STS1 7 /* Status 1 Register */ +#define PCI_HT_RM_CNT0 8 /* Retry Count 0 Register */ +#define PCI_HT_RM_CNT1 10 /* Retry Count 1 Register */ +#define PCI_HT_RM_SIZEOF 12 + +/* Vendor-Specific Capability (see PCI_EVNDR_xxx for the PCIe version) */ +#define PCI_VNDR_LENGTH 2 /* Length byte */ + +/* PCI Express */ +#define PCI_EXP_FLAGS 0x2 /* Capabilities register */ +#define PCI_EXP_FLAGS_VERS 0x000f /* Capability version */ +#define PCI_EXP_FLAGS_TYPE 0x00f0 /* Device/Port type */ +#define PCI_EXP_TYPE_ENDPOINT 0x0 /* Express Endpoint */ +#define PCI_EXP_TYPE_LEG_END 0x1 /* Legacy Endpoint */ +#define PCI_EXP_TYPE_ROOT_PORT 0x4 /* Root Port */ +#define PCI_EXP_TYPE_UPSTREAM 0x5 /* Upstream Port */ +#define PCI_EXP_TYPE_DOWNSTREAM 0x6 /* Downstream Port */ +#define PCI_EXP_TYPE_PCI_BRIDGE 0x7 /* PCI/PCI-X Bridge */ +#define PCI_EXP_TYPE_PCIE_BRIDGE 0x8 /* PCI/PCI-X to PCIE Bridge */ +#define PCI_EXP_TYPE_ROOT_INT_EP 0x9 /* Root Complex Integrated Endpoint */ +#define PCI_EXP_TYPE_ROOT_EC 0xa /* Root Complex Event Collector */ +#define PCI_EXP_FLAGS_SLOT 0x0100 /* Slot implemented */ +#define PCI_EXP_FLAGS_IRQ 0x3e00 /* Interrupt message number */ +#define PCI_EXP_DEVCAP 0x4 /* Device capabilities */ +#define PCI_EXP_DEVCAP_PAYLOAD 0x07 /* Max_Payload_Size */ +#define PCI_EXP_DEVCAP_PHANTOM 0x18 /* Phantom functions */ +#define PCI_EXP_DEVCAP_EXT_TAG 0x20 /* Extended tags */ +#define PCI_EXP_DEVCAP_L0S 0x1c0 /* L0s Acceptable Latency */ +#define PCI_EXP_DEVCAP_L1 0xe00 /* L1 Acceptable Latency */ +#define PCI_EXP_DEVCAP_ATN_BUT 0x1000 /* Attention Button Present */ +#define PCI_EXP_DEVCAP_ATN_IND 0x2000 /* Attention Indicator Present */ +#define PCI_EXP_DEVCAP_PWR_IND 0x4000 /* Power Indicator Present */ +#define PCI_EXP_DEVCAP_RBE 0x8000 /* Role-Based Error Reporting */ +#define PCI_EXP_DEVCAP_PWR_VAL 0x3fc0000 /* Slot Power Limit Value */ +#define PCI_EXP_DEVCAP_PWR_SCL 0xc000000 /* Slot Power Limit Scale */ +#define PCI_EXP_DEVCAP_FLRESET 0x10000000 /* Function-Level Reset */ +#define PCI_EXP_DEVCTL 0x8 /* Device Control */ +#define PCI_EXP_DEVCTL_CERE 0x0001 /* Correctable Error Reporting En. */ +#define PCI_EXP_DEVCTL_NFERE 0x0002 /* Non-Fatal Error Reporting Enable */ +#define PCI_EXP_DEVCTL_FERE 0x0004 /* Fatal Error Reporting Enable */ +#define PCI_EXP_DEVCTL_URRE 0x0008 /* Unsupported Request Reporting En. */ +#define PCI_EXP_DEVCTL_RELAXED 0x0010 /* Enable Relaxed Ordering */ +#define PCI_EXP_DEVCTL_PAYLOAD 0x00e0 /* Max_Payload_Size */ +#define PCI_EXP_DEVCTL_EXT_TAG 0x0100 /* Extended Tag Field Enable */ +#define PCI_EXP_DEVCTL_PHANTOM 0x0200 /* Phantom Functions Enable */ +#define PCI_EXP_DEVCTL_AUX_PME 0x0400 /* Auxiliary Power PM Enable */ +#define PCI_EXP_DEVCTL_NOSNOOP 0x0800 /* Enable No Snoop */ +#define PCI_EXP_DEVCTL_READRQ 0x7000 /* Max_Read_Request_Size */ +#define PCI_EXP_DEVCTL_BCRE 0x8000 /* Bridge Configuration Retry Enable */ +#define PCI_EXP_DEVCTL_FLRESET 0x8000 /* Function-Level Reset [bit shared with BCRE] */ +#define PCI_EXP_DEVSTA 0xa /* Device Status */ +#define PCI_EXP_DEVSTA_CED 0x01 /* Correctable Error Detected */ +#define PCI_EXP_DEVSTA_NFED 0x02 /* Non-Fatal Error Detected */ +#define PCI_EXP_DEVSTA_FED 0x04 /* Fatal Error Detected */ +#define PCI_EXP_DEVSTA_URD 0x08 /* Unsupported Request Detected */ +#define PCI_EXP_DEVSTA_AUXPD 0x10 /* AUX Power Detected */ +#define PCI_EXP_DEVSTA_TRPND 0x20 /* Transactions Pending */ +#define PCI_EXP_LNKCAP 0xc /* Link Capabilities */ +#define PCI_EXP_LNKCAP_SPEED 0x0000f /* Maximum Link Speed */ +#define PCI_EXP_LNKCAP_WIDTH 0x003f0 /* Maximum Link Width */ +#define PCI_EXP_LNKCAP_ASPM 0x00c00 /* Active State Power Management */ +#define PCI_EXP_LNKCAP_L0S 0x07000 /* L0s Acceptable Latency */ +#define PCI_EXP_LNKCAP_L1 0x38000 /* L1 Acceptable Latency */ +#define PCI_EXP_LNKCAP_CLOCKPM 0x40000 /* Clock Power Management */ +#define PCI_EXP_LNKCAP_SURPRISE 0x80000 /* Surprise Down Error Reporting */ +#define PCI_EXP_LNKCAP_DLLA 0x100000 /* Data Link Layer Active Reporting */ +#define PCI_EXP_LNKCAP_LBNC 0x200000 /* Link Bandwidth Notification Capability */ +#define PCI_EXP_LNKCAP_PORT 0xff000000 /* Port Number */ +#define PCI_EXP_LNKCTL 0x10 /* Link Control */ +#define PCI_EXP_LNKCTL_ASPM 0x0003 /* ASPM Control */ +#define PCI_EXP_LNKCTL_RCB 0x0008 /* Read Completion Boundary */ +#define PCI_EXP_LNKCTL_DISABLE 0x0010 /* Link Disable */ +#define PCI_EXP_LNKCTL_RETRAIN 0x0020 /* Retrain Link */ +#define PCI_EXP_LNKCTL_CLOCK 0x0040 /* Common Clock Configuration */ +#define PCI_EXP_LNKCTL_XSYNCH 0x0080 /* Extended Synch */ +#define PCI_EXP_LNKCTL_CLOCKPM 0x0100 /* Clock Power Management */ +#define PCI_EXP_LNKCTL_HWAUTWD 0x0200 /* Hardware Autonomous Width Disable */ +#define PCI_EXP_LNKCTL_BWMIE 0x0400 /* Bandwidth Mgmt Interrupt Enable */ +#define PCI_EXP_LNKCTL_AUTBWIE 0x0800 /* Autonomous Bandwidth Mgmt Interrupt Enable */ +#define PCI_EXP_LNKSTA 0x12 /* Link Status */ +#define PCI_EXP_LNKSTA_SPEED 0x000f /* Negotiated Link Speed */ +#define PCI_EXP_LNKSTA_WIDTH 0x03f0 /* Negotiated Link Width */ +#define PCI_EXP_LNKSTA_TR_ERR 0x0400 /* Training Error (obsolete) */ +#define PCI_EXP_LNKSTA_TRAIN 0x0800 /* Link Training */ +#define PCI_EXP_LNKSTA_SL_CLK 0x1000 /* Slot Clock Configuration */ +#define PCI_EXP_LNKSTA_DL_ACT 0x2000 /* Data Link Layer in DL_Active State */ +#define PCI_EXP_LNKSTA_BWMGMT 0x4000 /* Bandwidth Mgmt Status */ +#define PCI_EXP_LNKSTA_AUTBW 0x8000 /* Autonomous Bandwidth Mgmt Status */ +#define PCI_EXP_SLTCAP 0x14 /* Slot Capabilities */ +#define PCI_EXP_SLTCAP_ATNB 0x0001 /* Attention Button Present */ +#define PCI_EXP_SLTCAP_PWRC 0x0002 /* Power Controller Present */ +#define PCI_EXP_SLTCAP_MRL 0x0004 /* MRL Sensor Present */ +#define PCI_EXP_SLTCAP_ATNI 0x0008 /* Attention Indicator Present */ +#define PCI_EXP_SLTCAP_PWRI 0x0010 /* Power Indicator Present */ +#define PCI_EXP_SLTCAP_HPS 0x0020 /* Hot-Plug Surprise */ +#define PCI_EXP_SLTCAP_HPC 0x0040 /* Hot-Plug Capable */ +#define PCI_EXP_SLTCAP_PWR_VAL 0x00007f80 /* Slot Power Limit Value */ +#define PCI_EXP_SLTCAP_PWR_SCL 0x00018000 /* Slot Power Limit Scale */ +#define PCI_EXP_SLTCAP_INTERLOCK 0x020000 /* Electromechanical Interlock Present */ +#define PCI_EXP_SLTCAP_NOCMDCOMP 0x040000 /* No Command Completed Support */ +#define PCI_EXP_SLTCAP_PSN 0xfff80000 /* Physical Slot Number */ +#define PCI_EXP_SLTCTL 0x18 /* Slot Control */ +#define PCI_EXP_SLTCTL_ATNB 0x0001 /* Attention Button Pressed Enable */ +#define PCI_EXP_SLTCTL_PWRF 0x0002 /* Power Fault Detected Enable */ +#define PCI_EXP_SLTCTL_MRLS 0x0004 /* MRL Sensor Changed Enable */ +#define PCI_EXP_SLTCTL_PRSD 0x0008 /* Presence Detect Changed Enable */ +#define PCI_EXP_SLTCTL_CMDC 0x0010 /* Command Completed Interrupt Enable */ +#define PCI_EXP_SLTCTL_HPIE 0x0020 /* Hot-Plug Interrupt Enable */ +#define PCI_EXP_SLTCTL_ATNI 0x00c0 /* Attention Indicator Control */ +#define PCI_EXP_SLTCTL_PWRI 0x0300 /* Power Indicator Control */ +#define PCI_EXP_SLTCTL_PWRC 0x0400 /* Power Controller Control */ +#define PCI_EXP_SLTCTL_INTERLOCK 0x0800 /* Electromechanical Interlock Control */ +#define PCI_EXP_SLTCTL_LLCHG 0x1000 /* Data Link Layer State Changed Enable */ +#define PCI_EXP_SLTSTA 0x1a /* Slot Status */ +#define PCI_EXP_SLTSTA_ATNB 0x0001 /* Attention Button Pressed */ +#define PCI_EXP_SLTSTA_PWRF 0x0002 /* Power Fault Detected */ +#define PCI_EXP_SLTSTA_MRLS 0x0004 /* MRL Sensor Changed */ +#define PCI_EXP_SLTSTA_PRSD 0x0008 /* Presence Detect Changed */ +#define PCI_EXP_SLTSTA_CMDC 0x0010 /* Command Completed */ +#define PCI_EXP_SLTSTA_MRL_ST 0x0020 /* MRL Sensor State */ +#define PCI_EXP_SLTSTA_PRES 0x0040 /* Presence Detect State */ +#define PCI_EXP_SLTSTA_INTERLOCK 0x0080 /* Electromechanical Interlock Status */ +#define PCI_EXP_SLTSTA_LLCHG 0x0100 /* Data Link Layer State Changed */ +#define PCI_EXP_RTCTL 0x1c /* Root Control */ +#define PCI_EXP_RTCTL_SECEE 0x0001 /* System Error on Correctable Error */ +#define PCI_EXP_RTCTL_SENFEE 0x0002 /* System Error on Non-Fatal Error */ +#define PCI_EXP_RTCTL_SEFEE 0x0004 /* System Error on Fatal Error */ +#define PCI_EXP_RTCTL_PMEIE 0x0008 /* PME Interrupt Enable */ +#define PCI_EXP_RTCTL_CRSVIS 0x0010 /* Configuration Request Retry Status Visible to SW */ +#define PCI_EXP_RTCAP 0x1e /* Root Capabilities */ +#define PCI_EXP_RTCAP_CRSVIS 0x0010 /* Configuration Request Retry Status Visible to SW */ +#define PCI_EXP_RTSTA 0x20 /* Root Status */ +#define PCI_EXP_RTSTA_PME_REQID 0x0000ffff /* PME Requester ID */ +#define PCI_EXP_RTSTA_PME_STATUS 0x00010000 /* PME Status */ +#define PCI_EXP_RTSTA_PME_PENDING 0x00020000 /* PME is Pending */ +#define PCI_EXP_DEVCAP2 0x24 /* Device capabilities 2 */ +#define PCI_EXP_DEVCTL2 0x28 /* Device Control */ +#define PCI_EXP_DEV2_TIMEOUT_RANGE(x) ((x) & 0xf) /* Completion Timeout Ranges Supported */ +#define PCI_EXP_DEV2_TIMEOUT_VALUE(x) ((x) & 0xf) /* Completion Timeout Value */ +#define PCI_EXP_DEV2_TIMEOUT_DIS 0x0010 /* Completion Timeout Disable Supported */ +#define PCI_EXP_DEV2_ARI 0x0020 /* ARI Forwarding */ +#define PCI_EXP_DEVSTA2 0x2a /* Device Status */ +#define PCI_EXP_LNKCAP2 0x2c /* Link Capabilities */ +#define PCI_EXP_LNKCTL2 0x30 /* Link Control */ +#define PCI_EXP_LNKCTL2_SPEED(x) ((x) & 0xf) /* Target Link Speed */ +#define PCI_EXP_LNKCTL2_CMPLNC 0x0010 /* Enter Compliance */ +#define PCI_EXP_LNKCTL2_SPEED_DIS 0x0020 /* Hardware Autonomous Speed Disable */ +#define PCI_EXP_LNKCTL2_DEEMPHASIS(x) (((x) >> 6) & 1) /* Selectable De-emphasis */ +#define PCI_EXP_LNKCTL2_MARGIN(x) (((x) >> 7) & 7) /* Transmit Margin */ +#define PCI_EXP_LNKCTL2_MOD_CMPLNC 0x0400 /* Enter Modified Compliance */ +#define PCI_EXP_LNKCTL2_CMPLNC_SOS 0x0800 /* Compliance SOS */ +#define PCI_EXP_LNKCTL2_COM_DEEMPHASIS(x) (((x) >> 12) & 1) /* Compliance De-emphasis */ +#define PCI_EXP_LNKSTA2 0x32 /* Link Status */ +#define PCI_EXP_LINKSTA2_DEEMPHASIS(x) ((x) & 1) /* Current De-emphasis Level */ +#define PCI_EXP_SLTCAP2 0x34 /* Slot Capabilities */ +#define PCI_EXP_SLTCTL2 0x38 /* Slot Control */ +#define PCI_EXP_SLTSTA2 0x3a /* Slot Status */ + +/* MSI-X */ +#define PCI_MSIX_ENABLE 0x8000 +#define PCI_MSIX_MASK 0x4000 +#define PCI_MSIX_TABSIZE 0x07ff +#define PCI_MSIX_TABLE 4 +#define PCI_MSIX_PBA 8 +#define PCI_MSIX_BIR 0x7 + +/* Subsystem vendor/device ID for PCI bridges */ +#define PCI_SSVID_VENDOR 4 +#define PCI_SSVID_DEVICE 6 + +/* PCI Advanced Features */ +#define PCI_AF_CAP 3 +#define PCI_AF_CAP_TP 0x01 +#define PCI_AF_CAP_FLR 0x02 +#define PCI_AF_CTRL 4 +#define PCI_AF_CTRL_FLR 0x01 +#define PCI_AF_STATUS 5 +#define PCI_AF_STATUS_TP 0x01 + +/* SATA Host Bus Adapter */ +#define PCI_SATA_HBA_BARS 4 +#define PCI_SATA_HBA_REG0 8 + +/*** Definitions of extended capabilities ***/ + +/* Advanced Error Reporting */ +#define PCI_ERR_UNCOR_STATUS 4 /* Uncorrectable Error Status */ +#define PCI_ERR_UNC_TRAIN 0x00000001 /* Undefined in PCIe rev1.1 & 2.0 spec */ +#define PCI_ERR_UNC_DLP 0x00000010 /* Data Link Protocol */ +#define PCI_ERR_UNC_SDES 0x00000020 /* Surprise Down Error */ +#define PCI_ERR_UNC_POISON_TLP 0x00001000 /* Poisoned TLP */ +#define PCI_ERR_UNC_FCP 0x00002000 /* Flow Control Protocol */ +#define PCI_ERR_UNC_COMP_TIME 0x00004000 /* Completion Timeout */ +#define PCI_ERR_UNC_COMP_ABORT 0x00008000 /* Completer Abort */ +#define PCI_ERR_UNC_UNX_COMP 0x00010000 /* Unexpected Completion */ +#define PCI_ERR_UNC_RX_OVER 0x00020000 /* Receiver Overflow */ +#define PCI_ERR_UNC_MALF_TLP 0x00040000 /* Malformed TLP */ +#define PCI_ERR_UNC_ECRC 0x00080000 /* ECRC Error Status */ +#define PCI_ERR_UNC_UNSUP 0x00100000 /* Unsupported Request */ +#define PCI_ERR_UNC_ACS_VIOL 0x00200000 /* ACS Violation */ +#define PCI_ERR_UNCOR_MASK 8 /* Uncorrectable Error Mask */ + /* Same bits as above */ +#define PCI_ERR_UNCOR_SEVER 12 /* Uncorrectable Error Severity */ + /* Same bits as above */ +#define PCI_ERR_COR_STATUS 16 /* Correctable Error Status */ +#define PCI_ERR_COR_RCVR 0x00000001 /* Receiver Error Status */ +#define PCI_ERR_COR_BAD_TLP 0x00000040 /* Bad TLP Status */ +#define PCI_ERR_COR_BAD_DLLP 0x00000080 /* Bad DLLP Status */ +#define PCI_ERR_COR_REP_ROLL 0x00000100 /* REPLAY_NUM Rollover */ +#define PCI_ERR_COR_REP_TIMER 0x00001000 /* Replay Timer Timeout */ +#define PCI_ERR_COR_REP_ANFE 0x00002000 /* Advisory Non-Fatal Error */ +#define PCI_ERR_COR_MASK 20 /* Correctable Error Mask */ + /* Same bits as above */ +#define PCI_ERR_CAP 24 /* Advanced Error Capabilities */ +#define PCI_ERR_CAP_FEP(x) ((x) & 31) /* First Error Pointer */ +#define PCI_ERR_CAP_ECRC_GENC 0x00000020 /* ECRC Generation Capable */ +#define PCI_ERR_CAP_ECRC_GENE 0x00000040 /* ECRC Generation Enable */ +#define PCI_ERR_CAP_ECRC_CHKC 0x00000080 /* ECRC Check Capable */ +#define PCI_ERR_CAP_ECRC_CHKE 0x00000100 /* ECRC Check Enable */ +#define PCI_ERR_HEADER_LOG 28 /* Header Log Register (16 bytes) */ +#define PCI_ERR_ROOT_COMMAND 44 /* Root Error Command */ +#define PCI_ERR_ROOT_STATUS 48 +#define PCI_ERR_ROOT_COR_SRC 52 +#define PCI_ERR_ROOT_SRC 54 + +/* Virtual Channel */ +#define PCI_VC_PORT_REG1 4 +#define PCI_VC_PORT_REG2 8 +#define PCI_VC_PORT_CTRL 12 +#define PCI_VC_PORT_STATUS 14 +#define PCI_VC_RES_CAP 16 +#define PCI_VC_RES_CTRL 20 +#define PCI_VC_RES_STATUS 26 + +/* Power Budgeting */ +#define PCI_PWR_DSR 4 /* Data Select Register */ +#define PCI_PWR_DATA 8 /* Data Register */ +#define PCI_PWR_DATA_BASE(x) ((x) & 0xff) /* Base Power */ +#define PCI_PWR_DATA_SCALE(x) (((x) >> 8) & 3) /* Data Scale */ +#define PCI_PWR_DATA_PM_SUB(x) (((x) >> 10) & 7) /* PM Sub State */ +#define PCI_PWR_DATA_PM_STATE(x) (((x) >> 13) & 3) /* PM State */ +#define PCI_PWR_DATA_TYPE(x) (((x) >> 15) & 7) /* Type */ +#define PCI_PWR_DATA_RAIL(x) (((x) >> 18) & 7) /* Power Rail */ +#define PCI_PWR_CAP 12 /* Capability */ +#define PCI_PWR_CAP_BUDGET(x) ((x) & 1) /* Included in system budget */ + +/* Root Complex Link */ +#define PCI_RCLINK_ESD 4 /* Element Self Description */ +#define PCI_RCLINK_LINK1 16 /* First Link Entry */ +#define PCI_RCLINK_LINK_DESC 0 /* Link Entry: Description */ +#define PCI_RCLINK_LINK_ADDR 8 /* Link Entry: Address (64-bit) */ +#define PCI_RCLINK_LINK_SIZE 16 /* Link Entry: sizeof */ + +/* PCIe Vendor-Specific Capability */ +#define PCI_EVNDR_HEADER 4 /* Vendor-Specific Header */ +#define PCI_EVNDR_REGISTERS 8 /* Vendor-Specific Registers */ + +/* Access Control Services */ +#define PCI_ACS_CAP 0x04 /* ACS Capability Register */ +#define PCI_ACS_CAP_VALID 0x0001 /* ACS Source Validation */ +#define PCI_ACS_CAP_BLOCK 0x0002 /* ACS Translation Blocking */ +#define PCI_ACS_CAP_REQ_RED 0x0004 /* ACS P2P Request Redirect */ +#define PCI_ACS_CAP_CMPLT_RED 0x0008 /* ACS P2P Completion Redirect */ +#define PCI_ACS_CAP_FORWARD 0x0010 /* ACS Upstream Forwarding */ +#define PCI_ACS_CAP_EGRESS 0x0020 /* ACS P2P Egress Control */ +#define PCI_ACS_CAP_TRANS 0x0040 /* ACS Direct Translated P2P */ +#define PCI_ACS_CAP_VECTOR(x) (((x) >> 8) & 0xff) /* Egress Control Vector Size */ +#define PCI_ACS_CTRL 0x06 /* ACS Control Register */ +#define PCI_ACS_CTRL_VALID 0x0001 /* ACS Source Validation Enable */ +#define PCI_ACS_CTRL_BLOCK 0x0002 /* ACS Translation Blocking Enable */ +#define PCI_ACS_CTRL_REQ_RED 0x0004 /* ACS P2P Request Redirect Enable */ +#define PCI_ACS_CTRL_CMPLT_RED 0x0008 /* ACS P2P Completion Redirect Enable */ +#define PCI_ACS_CTRL_FORWARD 0x0010 /* ACS Upstream Forwarding Enable */ +#define PCI_ACS_CTRL_EGRESS 0x0020 /* ACS P2P Egress Control Enable */ +#define PCI_ACS_CTRL_TRANS 0x0040 /* ACS Direct Translated P2P Enable */ +#define PCI_ACS_EGRESS_CTRL 0x08 /* Egress Control Vector */ + +/* Alternative Routing-ID Interpretation */ +#define PCI_ARI_CAP 0x04 /* ARI Capability Register */ +#define PCI_ARI_CAP_MFVC 0x0001 /* MFVC Function Groups Capability */ +#define PCI_ARI_CAP_ACS 0x0002 /* ACS Function Groups Capability */ +#define PCI_ARI_CAP_NFN(x) (((x) >> 8) & 0xff) /* Next Function Number */ +#define PCI_ARI_CTRL 0x06 /* ARI Control Register */ +#define PCI_ARI_CTRL_MFVC 0x0001 /* MFVC Function Groups Enable */ +#define PCI_ARI_CTRL_ACS 0x0002 /* ACS Function Groups Enable */ +#define PCI_ARI_CTRL_FG(x) (((x) >> 4) & 7) /* Function Group */ + +/* Address Translation Service */ +#define PCI_ATS_CAP 0x04 /* ATS Capability Register */ +#define PCI_ATS_CAP_IQD(x) ((x) & 0x1f) /* Invalidate Queue Depth */ +#define PCI_ATS_CTRL 0x06 /* ATS Control Register */ +#define PCI_ATS_CTRL_STU(x) ((x) & 0x1f) /* Smallest Translation Unit */ +#define PCI_ATS_CTRL_ENABLE 0x8000 /* ATS Enable */ + +/* Single Root I/O Virtualization */ +#define PCI_IOV_CAP 0x04 /* SR-IOV Capability Register */ +#define PCI_IOV_CAP_VFM 0x00000001 /* VF Migration Capable */ +#define PCI_IOV_CAP_IMN(x) ((x) >> 21) /* VF Migration Interrupt Message Number */ +#define PCI_IOV_CTRL 0x08 /* SR-IOV Control Register */ +#define PCI_IOV_CTRL_VFE 0x0001 /* VF Enable */ +#define PCI_IOV_CTRL_VFME 0x0002 /* VF Migration Enable */ +#define PCI_IOV_CTRL_VFMIE 0x0004 /* VF Migration Interrupt Enable */ +#define PCI_IOV_CTRL_MSE 0x0008 /* VF MSE */ +#define PCI_IOV_CTRL_ARI 0x0010 /* ARI Capable Hierarchy */ +#define PCI_IOV_STATUS 0x0a /* SR-IOV Status Register */ +#define PCI_IOV_STATUS_MS 0x0001 /* VF Migration Status */ +#define PCI_IOV_INITIALVF 0x0c /* Number of VFs that are initially associated */ +#define PCI_IOV_TOTALVF 0x0e /* Maximum number of VFs that could be associated */ +#define PCI_IOV_NUMVF 0x10 /* Number of VFs that are available */ +#define PCI_IOV_FDL 0x12 /* Function Dependency Link */ +#define PCI_IOV_OFFSET 0x14 /* First VF Offset */ +#define PCI_IOV_STRIDE 0x16 /* Routing ID offset from one VF to the next one */ +#define PCI_IOV_DID 0x1a /* VF Device ID */ +#define PCI_IOV_SUPPS 0x1c /* Supported Page Sizes */ +#define PCI_IOV_SYSPS 0x20 /* System Page Size */ +#define PCI_IOV_BAR_BASE 0x24 /* VF BAR0, VF BAR1, ... VF BAR5 */ +#define PCI_IOV_NUM_BAR 6 /* Number of VF BARs */ +#define PCI_IOV_MSAO 0x3c /* VF Migration State Array Offset */ +#define PCI_IOV_MSA_BIR(x) ((x) & 7) /* VF Migration State BIR */ +#define PCI_IOV_MSA_OFFSET(x) ((x) & 0xfffffff8) /* VF Migration State Offset */ + +/* Transaction Processing Hints */ +#define PCI_TPH_CAPABILITIES 4 +#define PCI_TPH_INTVEC_SUP (1<<1) /* Supports interrupt vector mode */ +#define PCI_TPH_DEV_SUP (1<<2) /* Device specific mode supported */ +#define PCI_TPH_EXT_REQ_SUP (1<<8) /* Supports extended requests */ +#define PCI_TPH_ST_LOC_MASK (3<<9) /* Steering table location bits */ +#define PCI_TPH_ST_NONE (0<<9) /* No steering table */ +#define PCI_TPH_ST_CAP (1<<9) /* Steering table in TPH cap */ +#define PCI_TPH_ST_MSIX (2<<9) /* Steering table in MSI-X table */ +#define PCI_TPH_ST_SIZE_SHIFT (16) /* Encoded as size - 1 */ + +/* Latency Tolerance Reporting */ +#define PCI_LTR_MAX_SNOOP 4 /* 16 bit value */ +#define PCI_LTR_VALUE_MASK (0x3ff) +#define PCI_LTR_SCALE_SHIFT (10) +#define PCI_LTR_SCALE_MASK (7) +#define PCI_LTR_MAX_NOSNOOP 6 /* 16 bit value */ + +/* + * The PCI interface treats multi-function devices as independent + * devices. The slot/function address of each device is encoded + * in a single byte as follows: + * + * 7:3 = slot + * 2:0 = function + */ +#define PCI_DEVFN(slot,func) ((((slot) & 0x1f) << 3) | ((func) & 0x07)) +#define PCI_SLOT(devfn) (((devfn) >> 3) & 0x1f) +#define PCI_FUNC(devfn) ((devfn) & 0x07) + +/* Device classes and subclasses */ + +#define PCI_CLASS_NOT_DEFINED 0x0000 +#define PCI_CLASS_NOT_DEFINED_VGA 0x0001 + +#define PCI_BASE_CLASS_STORAGE 0x01 +#define PCI_CLASS_STORAGE_SCSI 0x0100 +#define PCI_CLASS_STORAGE_IDE 0x0101 +#define PCI_CLASS_STORAGE_FLOPPY 0x0102 +#define PCI_CLASS_STORAGE_IPI 0x0103 +#define PCI_CLASS_STORAGE_RAID 0x0104 +#define PCI_CLASS_STORAGE_ATA 0x0105 +#define PCI_CLASS_STORAGE_SATA 0x0106 +#define PCI_CLASS_STORAGE_SAS 0x0107 +#define PCI_CLASS_STORAGE_OTHER 0x0180 + +#define PCI_BASE_CLASS_NETWORK 0x02 +#define PCI_CLASS_NETWORK_ETHERNET 0x0200 +#define PCI_CLASS_NETWORK_TOKEN_RING 0x0201 +#define PCI_CLASS_NETWORK_FDDI 0x0202 +#define PCI_CLASS_NETWORK_ATM 0x0203 +#define PCI_CLASS_NETWORK_ISDN 0x0204 +#define PCI_CLASS_NETWORK_OTHER 0x0280 + +#define PCI_BASE_CLASS_DISPLAY 0x03 +#define PCI_CLASS_DISPLAY_VGA 0x0300 +#define PCI_CLASS_DISPLAY_XGA 0x0301 +#define PCI_CLASS_DISPLAY_3D 0x0302 +#define PCI_CLASS_DISPLAY_OTHER 0x0380 + +#define PCI_BASE_CLASS_MULTIMEDIA 0x04 +#define PCI_CLASS_MULTIMEDIA_VIDEO 0x0400 +#define PCI_CLASS_MULTIMEDIA_AUDIO 0x0401 +#define PCI_CLASS_MULTIMEDIA_PHONE 0x0402 +#define PCI_CLASS_MULTIMEDIA_AUDIO_DEV 0x0403 +#define PCI_CLASS_MULTIMEDIA_OTHER 0x0480 + +#define PCI_BASE_CLASS_MEMORY 0x05 +#define PCI_CLASS_MEMORY_RAM 0x0500 +#define PCI_CLASS_MEMORY_FLASH 0x0501 +#define PCI_CLASS_MEMORY_OTHER 0x0580 + +#define PCI_BASE_CLASS_BRIDGE 0x06 +#define PCI_CLASS_BRIDGE_HOST 0x0600 +#define PCI_CLASS_BRIDGE_ISA 0x0601 +#define PCI_CLASS_BRIDGE_EISA 0x0602 +#define PCI_CLASS_BRIDGE_MC 0x0603 +#define PCI_CLASS_BRIDGE_PCI 0x0604 +#define PCI_CLASS_BRIDGE_PCMCIA 0x0605 +#define PCI_CLASS_BRIDGE_NUBUS 0x0606 +#define PCI_CLASS_BRIDGE_CARDBUS 0x0607 +#define PCI_CLASS_BRIDGE_RACEWAY 0x0608 +#define PCI_CLASS_BRIDGE_PCI_SEMI 0x0609 +#define PCI_CLASS_BRIDGE_IB_TO_PCI 0x060a +#define PCI_CLASS_BRIDGE_OTHER 0x0680 + +#define PCI_BASE_CLASS_COMMUNICATION 0x07 +#define PCI_CLASS_COMMUNICATION_SERIAL 0x0700 +#define PCI_CLASS_COMMUNICATION_PARALLEL 0x0701 +#define PCI_CLASS_COMMUNICATION_MSERIAL 0x0702 +#define PCI_CLASS_COMMUNICATION_MODEM 0x0703 +#define PCI_CLASS_COMMUNICATION_OTHER 0x0780 + +#define PCI_BASE_CLASS_SYSTEM 0x08 +#define PCI_CLASS_SYSTEM_PIC 0x0800 +#define PCI_CLASS_SYSTEM_DMA 0x0801 +#define PCI_CLASS_SYSTEM_TIMER 0x0802 +#define PCI_CLASS_SYSTEM_RTC 0x0803 +#define PCI_CLASS_SYSTEM_PCI_HOTPLUG 0x0804 +#define PCI_CLASS_SYSTEM_OTHER 0x0880 + +#define PCI_BASE_CLASS_INPUT 0x09 +#define PCI_CLASS_INPUT_KEYBOARD 0x0900 +#define PCI_CLASS_INPUT_PEN 0x0901 +#define PCI_CLASS_INPUT_MOUSE 0x0902 +#define PCI_CLASS_INPUT_SCANNER 0x0903 +#define PCI_CLASS_INPUT_GAMEPORT 0x0904 +#define PCI_CLASS_INPUT_OTHER 0x0980 + +#define PCI_BASE_CLASS_DOCKING 0x0a +#define PCI_CLASS_DOCKING_GENERIC 0x0a00 +#define PCI_CLASS_DOCKING_OTHER 0x0a80 + +#define PCI_BASE_CLASS_PROCESSOR 0x0b +#define PCI_CLASS_PROCESSOR_386 0x0b00 +#define PCI_CLASS_PROCESSOR_486 0x0b01 +#define PCI_CLASS_PROCESSOR_PENTIUM 0x0b02 +#define PCI_CLASS_PROCESSOR_ALPHA 0x0b10 +#define PCI_CLASS_PROCESSOR_POWERPC 0x0b20 +#define PCI_CLASS_PROCESSOR_MIPS 0x0b30 +#define PCI_CLASS_PROCESSOR_CO 0x0b40 + +#define PCI_BASE_CLASS_SERIAL 0x0c +#define PCI_CLASS_SERIAL_FIREWIRE 0x0c00 +#define PCI_CLASS_SERIAL_ACCESS 0x0c01 +#define PCI_CLASS_SERIAL_SSA 0x0c02 +#define PCI_CLASS_SERIAL_USB 0x0c03 +#define PCI_CLASS_SERIAL_FIBER 0x0c04 +#define PCI_CLASS_SERIAL_SMBUS 0x0c05 +#define PCI_CLASS_SERIAL_INFINIBAND 0x0c06 + +#define PCI_BASE_CLASS_WIRELESS 0x0d +#define PCI_CLASS_WIRELESS_IRDA 0x0d00 +#define PCI_CLASS_WIRELESS_CONSUMER_IR 0x0d01 +#define PCI_CLASS_WIRELESS_RF 0x0d10 +#define PCI_CLASS_WIRELESS_OTHER 0x0d80 + +#define PCI_BASE_CLASS_INTELLIGENT 0x0e +#define PCI_CLASS_INTELLIGENT_I2O 0x0e00 + +#define PCI_BASE_CLASS_SATELLITE 0x0f +#define PCI_CLASS_SATELLITE_TV 0x0f00 +#define PCI_CLASS_SATELLITE_AUDIO 0x0f01 +#define PCI_CLASS_SATELLITE_VOICE 0x0f03 +#define PCI_CLASS_SATELLITE_DATA 0x0f04 + +#define PCI_BASE_CLASS_CRYPT 0x10 +#define PCI_CLASS_CRYPT_NETWORK 0x1000 +#define PCI_CLASS_CRYPT_ENTERTAINMENT 0x1010 +#define PCI_CLASS_CRYPT_OTHER 0x1080 + +#define PCI_BASE_CLASS_SIGNAL 0x11 +#define PCI_CLASS_SIGNAL_DPIO 0x1100 +#define PCI_CLASS_SIGNAL_PERF_CTR 0x1101 +#define PCI_CLASS_SIGNAL_SYNCHRONIZER 0x1110 +#define PCI_CLASS_SIGNAL_OTHER 0x1180 + +#define PCI_CLASS_OTHERS 0xff + +/* Several ID's we need in the library */ + +#define PCI_VENDOR_ID_INTEL 0x8086 +#define PCI_VENDOR_ID_COMPAQ 0x0e11 diff --git a/ext/hwloc/include/pci/pci.h b/ext/hwloc/include/pci/pci.h new file mode 100644 index 000000000..7a5a6b80c --- /dev/null +++ b/ext/hwloc/include/pci/pci.h @@ -0,0 +1,240 @@ +/* + * The PCI Library + * + * Copyright (c) 1997--2009 Martin Mares + * + * Can be freely distributed and used under the terms of the GNU GPL. + */ + +#ifndef _PCI_LIB_H +#define _PCI_LIB_H + +#ifndef PCI_CONFIG_H +#include "config.h" +#endif + +#include "header.h" +#include "types.h" + +#define PCI_LIB_VERSION 0x030100 + +#ifndef PCI_ABI +#define PCI_ABI +#endif + +/* + * PCI Access Structure + */ + +struct pci_methods; + +enum pci_access_type { + /* Known access methods, remember to update access.c as well */ + PCI_ACCESS_AUTO, /* Autodetection */ + PCI_ACCESS_SYS_BUS_PCI, /* Linux /sys/bus/pci */ + PCI_ACCESS_PROC_BUS_PCI, /* Linux /proc/bus/pci */ + PCI_ACCESS_I386_TYPE1, /* i386 ports, type 1 */ + PCI_ACCESS_I386_TYPE2, /* i386 ports, type 2 */ + PCI_ACCESS_FBSD_DEVICE, /* FreeBSD /dev/pci */ + PCI_ACCESS_AIX_DEVICE, /* /dev/pci0, /dev/bus0, etc. */ + PCI_ACCESS_NBSD_LIBPCI, /* NetBSD libpci */ + PCI_ACCESS_OBSD_DEVICE, /* OpenBSD /dev/pci */ + PCI_ACCESS_DUMP, /* Dump file */ + PCI_ACCESS_MAX +}; + +struct pci_access { + /* Options you can change: */ + unsigned int method; /* Access method */ + int writeable; /* Open in read/write mode */ + int buscentric; /* Bus-centric view of the world */ + + char *id_file_name; /* Name of ID list file (use pci_set_name_list_path()) */ + int free_id_name; /* Set if id_file_name is malloced */ + int numeric_ids; /* Enforce PCI_LOOKUP_NUMERIC (>1 => PCI_LOOKUP_MIXED) */ + + unsigned int id_lookup_mode; /* pci_lookup_mode flags which are set automatically */ + /* Default: PCI_LOOKUP_CACHE */ + + int debugging; /* Turn on debugging messages */ + + /* Functions you can override: */ + void (*error)(char *msg, ...) PCI_PRINTF(1,2); /* Write error message and quit */ + void (*warning)(char *msg, ...) PCI_PRINTF(1,2); /* Write a warning message */ + void (*debug)(char *msg, ...) PCI_PRINTF(1,2); /* Write a debugging message */ + + struct pci_dev *devices; /* Devices found on this bus */ + + /* Fields used internally: */ + struct pci_methods *methods; + struct pci_param *params; + struct id_entry **id_hash; /* names.c */ + struct id_bucket *current_id_bucket; + int id_load_failed; + int id_cache_status; /* 0=not read, 1=read, 2=dirty */ + int fd; /* proc/sys: fd for config space */ + int fd_rw; /* proc/sys: fd opened read-write */ + int fd_pos; /* proc/sys: current position */ + int fd_vpd; /* sys: fd for VPD */ + struct pci_dev *cached_dev; /* proc/sys: device the fds are for */ +}; + +/* Initialize PCI access */ +struct pci_access *pci_alloc(void) PCI_ABI; +void pci_init(struct pci_access *) PCI_ABI; +void pci_cleanup(struct pci_access *) PCI_ABI; + +/* Scanning of devices */ +void pci_scan_bus(struct pci_access *acc) PCI_ABI; +struct pci_dev *pci_get_dev(struct pci_access *acc, int domain, int bus, int dev, int func) PCI_ABI; /* Raw access to specified device */ +void pci_free_dev(struct pci_dev *) PCI_ABI; + +/* Names of access methods */ +int pci_lookup_method(char *name) PCI_ABI; /* Returns -1 if not found */ +char *pci_get_method_name(int index) PCI_ABI; /* Returns "" if unavailable, NULL if index out of range */ + +/* + * Named parameters + */ + +struct pci_param { + struct pci_param *next; /* Please use pci_walk_params() for traversing the list */ + char *param; /* Name of the parameter */ + char *value; /* Value of the parameter */ + int value_malloced; /* used internally */ + char *help; /* Explanation of the parameter */ +}; + +char *pci_get_param(struct pci_access *acc, char *param) PCI_ABI; +int pci_set_param(struct pci_access *acc, char *param, char *value) PCI_ABI; /* 0 on success, -1 if no such parameter */ +/* To traverse the list, call pci_walk_params repeatedly, first with prev=NULL, and do not modify the parameters during traversal. */ +struct pci_param *pci_walk_params(struct pci_access *acc, struct pci_param *prev) PCI_ABI; + +/* + * Devices + */ + +struct pci_dev { + struct pci_dev *next; /* Next device in the chain */ + u16 domain; /* PCI domain (host bridge) */ + u8 bus, dev, func; /* Bus inside domain, device and function */ + + /* These fields are set by pci_fill_info() */ + int known_fields; /* Set of info fields already known */ + u16 vendor_id, device_id; /* Identity of the device */ + u16 device_class; /* PCI device class */ + int irq; /* IRQ number */ + pciaddr_t base_addr[6]; /* Base addresses including flags in lower bits */ + pciaddr_t size[6]; /* Region sizes */ + pciaddr_t rom_base_addr; /* Expansion ROM base address */ + pciaddr_t rom_size; /* Expansion ROM size */ + struct pci_cap *first_cap; /* List of capabilities */ + char *phy_slot; /* Physical slot */ + + /* Fields used internally: */ + struct pci_access *access; + struct pci_methods *methods; + u8 *cache; /* Cached config registers */ + int cache_len; + int hdrtype; /* Cached low 7 bits of header type, -1 if unknown */ + void *aux; /* Auxillary data */ +}; + +#define PCI_ADDR_IO_MASK (~(pciaddr_t) 0x3) +#define PCI_ADDR_MEM_MASK (~(pciaddr_t) 0xf) +#define PCI_ADDR_FLAG_MASK 0xf + +u8 pci_read_byte(struct pci_dev *, int pos) PCI_ABI; /* Access to configuration space */ +u16 pci_read_word(struct pci_dev *, int pos) PCI_ABI; +u32 pci_read_long(struct pci_dev *, int pos) PCI_ABI; +int pci_read_block(struct pci_dev *, int pos, u8 *buf, int len) PCI_ABI; +int pci_read_vpd(struct pci_dev *d, int pos, u8 *buf, int len) PCI_ABI; +int pci_write_byte(struct pci_dev *, int pos, u8 data) PCI_ABI; +int pci_write_word(struct pci_dev *, int pos, u16 data) PCI_ABI; +int pci_write_long(struct pci_dev *, int pos, u32 data) PCI_ABI; +int pci_write_block(struct pci_dev *, int pos, u8 *buf, int len) PCI_ABI; + +int pci_fill_info(struct pci_dev *, int flags) PCI_ABI; /* Fill in device information */ + +#define PCI_FILL_IDENT 1 +#define PCI_FILL_IRQ 2 +#define PCI_FILL_BASES 4 +#define PCI_FILL_ROM_BASE 8 +#define PCI_FILL_SIZES 16 +#define PCI_FILL_CLASS 32 +#define PCI_FILL_CAPS 64 +#define PCI_FILL_EXT_CAPS 128 +#define PCI_FILL_PHYS_SLOT 256 +#define PCI_FILL_RESCAN 0x10000 + +void pci_setup_cache(struct pci_dev *, u8 *cache, int len) PCI_ABI; + +/* + * Capabilities + */ + +struct pci_cap { + struct pci_cap *next; + u16 id; /* PCI_CAP_ID_xxx */ + u16 type; /* PCI_CAP_xxx */ + unsigned int addr; /* Position in the config space */ +}; + +#define PCI_CAP_NORMAL 1 /* Traditional PCI capabilities */ +#define PCI_CAP_EXTENDED 2 /* PCIe extended capabilities */ + +struct pci_cap *pci_find_cap(struct pci_dev *, unsigned int id, unsigned int type) PCI_ABI; + +/* + * Filters + */ + +struct pci_filter { + int domain, bus, slot, func; /* -1 = ANY */ + int vendor, device; +}; + +void pci_filter_init(struct pci_access *, struct pci_filter *) PCI_ABI; +char *pci_filter_parse_slot(struct pci_filter *, char *) PCI_ABI; +char *pci_filter_parse_id(struct pci_filter *, char *) PCI_ABI; +int pci_filter_match(struct pci_filter *, struct pci_dev *) PCI_ABI; + +/* + * Conversion of PCI ID's to names (according to the pci.ids file) + * + * Call pci_lookup_name() to identify different types of ID's: + * + * VENDOR (vendorID) -> vendor + * DEVICE (vendorID, deviceID) -> device + * VENDOR | DEVICE (vendorID, deviceID) -> combined vendor and device + * SUBSYSTEM | VENDOR (subvendorID) -> subsystem vendor + * SUBSYSTEM | DEVICE (vendorID, deviceID, subvendorID, subdevID) -> subsystem device + * SUBSYSTEM | VENDOR | DEVICE (vendorID, deviceID, subvendorID, subdevID) -> combined subsystem v+d + * SUBSYSTEM | ... (-1, -1, subvendorID, subdevID) -> generic subsystem + * CLASS (classID) -> class + * PROGIF (classID, progif) -> programming interface + */ + +char *pci_lookup_name(struct pci_access *a, char *buf, int size, int flags, ...) PCI_ABI; + +int pci_load_name_list(struct pci_access *a) PCI_ABI; /* Called automatically by pci_lookup_*() when needed; returns success */ +void pci_free_name_list(struct pci_access *a) PCI_ABI; /* Called automatically by pci_cleanup() */ +void pci_set_name_list_path(struct pci_access *a, char *name, int to_be_freed) PCI_ABI; +void pci_id_cache_flush(struct pci_access *a) PCI_ABI; + +enum pci_lookup_mode { + PCI_LOOKUP_VENDOR = 1, /* Vendor name (args: vendorID) */ + PCI_LOOKUP_DEVICE = 2, /* Device name (args: vendorID, deviceID) */ + PCI_LOOKUP_CLASS = 4, /* Device class (args: classID) */ + PCI_LOOKUP_SUBSYSTEM = 8, + PCI_LOOKUP_PROGIF = 16, /* Programming interface (args: classID, prog_if) */ + PCI_LOOKUP_NUMERIC = 0x10000, /* Want only formatted numbers; default if access->numeric_ids is set */ + PCI_LOOKUP_NO_NUMBERS = 0x20000, /* Return NULL if not found in the database; default is to print numerically */ + PCI_LOOKUP_MIXED = 0x40000, /* Include both numbers and names */ + PCI_LOOKUP_NETWORK = 0x80000, /* Try to resolve unknown ID's by DNS */ + PCI_LOOKUP_SKIP_LOCAL = 0x100000, /* Do not consult local database */ + PCI_LOOKUP_CACHE = 0x200000, /* Consult the local cache before using DNS */ + PCI_LOOKUP_REFRESH_CACHE = 0x400000, /* Forget all previously cached entries, but still allow updating the cache */ +}; + +#endif diff --git a/ext/hwloc/include/pci/types.h b/ext/hwloc/include/pci/types.h new file mode 100644 index 000000000..4d23e692b --- /dev/null +++ b/ext/hwloc/include/pci/types.h @@ -0,0 +1,65 @@ +/* + * The PCI Library -- Types and Format Strings + * + * Copyright (c) 1997--2008 Martin Mares + * + * Can be freely distributed and used under the terms of the GNU GPL. + */ + +#include + +#ifndef PCI_HAVE_Uxx_TYPES + +#ifdef PCI_OS_WINDOWS +#include +typedef BYTE u8; +typedef WORD u16; +typedef DWORD u32; +#elif defined(PCI_HAVE_STDINT_H) || (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L) +#include +typedef uint8_t u8; +typedef uint16_t u16; +typedef uint32_t u32; +#else +typedef u_int8_t u8; +typedef u_int16_t u16; +typedef u_int32_t u32; +#endif + +#ifdef PCI_HAVE_64BIT_ADDRESS +#include +#if ULONG_MAX > 0xffffffff +typedef unsigned long u64; +#define PCI_U64_FMT "l" +#else +typedef unsigned long long u64; +#define PCI_U64_FMT "ll" +#endif +#endif + +#endif /* PCI_HAVE_Uxx_TYPES */ + +#ifdef PCI_HAVE_64BIT_ADDRESS +typedef u64 pciaddr_t; +#define PCIADDR_T_FMT "%08" PCI_U64_FMT "x" +#define PCIADDR_PORT_FMT "%04" PCI_U64_FMT "x" +#else +typedef u32 pciaddr_t; +#define PCIADDR_T_FMT "%08x" +#define PCIADDR_PORT_FMT "%04x" +#endif + +#ifdef PCI_ARCH_SPARC64 +/* On sparc64 Linux the kernel reports remapped port addresses and IRQ numbers */ +#undef PCIADDR_PORT_FMT +#define PCIADDR_PORT_FMT PCIADDR_T_FMT +#define PCIIRQ_FMT "%08x" +#else +#define PCIIRQ_FMT "%d" +#endif + +#if defined(__GNUC__) && __GNUC__ > 2 +#define PCI_PRINTF(x,y) __attribute__((format(printf, x, y))) +#else +#define PCI_PRINTF(x,y) +#endif diff --git a/ext/hwloc/include/private/autogen/config.h b/ext/hwloc/include/private/autogen/config.h new file mode 100644 index 000000000..6f440d09b --- /dev/null +++ b/ext/hwloc/include/private/autogen/config.h @@ -0,0 +1,684 @@ +/* include/private/autogen/config.h. Generated from config.h.in by configure. */ +/* include/private/autogen/config.h.in. Generated from configure.ac by autoheader. */ + +/* -*- c -*- + * + * Copyright © 2009, 2011, 2012 CNRS, inria., Université Bordeaux 1 All rights reserved. + * Copyright © 2009 Cisco Systems, Inc. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + * + * This file is automatically generated by configure. Edits will be lost + * the next time you run configure! + */ + +#ifndef HWLOC_CONFIGURE_H +#define HWLOC_CONFIGURE_H + + +/* Define to 1 if the system has the type `CACHE_DESCRIPTOR'. */ +/* #undef HAVE_CACHE_DESCRIPTOR */ + +/* Define to 1 if the system has the type `CACHE_RELATIONSHIP'. */ +/* #undef HAVE_CACHE_RELATIONSHIP */ + +/* Define to 1 if you have the `clz' function. */ +/* #undef HAVE_CLZ */ + +/* Define to 1 if you have the `clzl' function. */ +/* #undef HAVE_CLZL */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_CL_CL_EXT_H */ + +/* Define to 1 if you have the `cpuset_setaffinity' function. */ +/* #undef HAVE_CPUSET_SETAFFINITY */ + +/* Define to 1 if you have the `cpuset_setid' function. */ +/* #undef HAVE_CPUSET_SETID */ + +/* Define to 1 if we have -lcuda */ +/* #undef HAVE_CUDA */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_CUDA_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_CUDA_RUNTIME_API_H */ + +/* Define to 1 if you have the declaration of `CL_DEVICE_TOPOLOGY_AMD', and to + 0 if you don't. */ +/* #undef HAVE_DECL_CL_DEVICE_TOPOLOGY_AMD */ + +/* Define to 1 if you have the declaration of `CTL_HW', and to 0 if you don't. + */ +#define HAVE_DECL_CTL_HW 0 + +/* Define to 1 if you have the declaration of `fabsf', and to 0 if you don't. + */ +#define HAVE_DECL_FABSF 1 + +/* Define to 1 if you have the declaration of `HW_NCPU', and to 0 if you + don't. */ +#define HAVE_DECL_HW_NCPU 0 + +/* Define to 1 if you have the declaration of + `nvmlDeviceGetMaxPcieLinkGeneration', and to 0 if you don't. */ +/* #undef HAVE_DECL_NVMLDEVICEGETMAXPCIELINKGENERATION */ + +/* Define to 1 if you have the declaration of `PCI_LOOKUP_NO_NUMBERS', and to + 0 if you don't. */ +/* #undef HAVE_DECL_PCI_LOOKUP_NO_NUMBERS */ + +/* Define to 1 if you have the declaration of `pthread_getaffinity_np', and to + 0 if you don't. */ +#define HAVE_DECL_PTHREAD_GETAFFINITY_NP 1 + +/* Define to 1 if you have the declaration of `pthread_setaffinity_np', and to + 0 if you don't. */ +#define HAVE_DECL_PTHREAD_SETAFFINITY_NP 1 + +/* Define to 1 if you have the declaration of `strtoull', and to 0 if you + don't. */ +#define HAVE_DECL_STRTOULL 1 + +/* Define to 1 if you have the declaration of `_SC_LARGE_PAGESIZE', and to 0 + if you don't. */ +#define HAVE_DECL__SC_LARGE_PAGESIZE 0 + +/* Define to 1 if you have the declaration of `_SC_NPROCESSORS_CONF', and to 0 + if you don't. */ +#define HAVE_DECL__SC_NPROCESSORS_CONF 1 + +/* Define to 1 if you have the declaration of `_SC_NPROCESSORS_ONLN', and to 0 + if you don't. */ +#define HAVE_DECL__SC_NPROCESSORS_ONLN 1 + +/* Define to 1 if you have the declaration of `_SC_NPROC_CONF', and to 0 if + you don't. */ +#define HAVE_DECL__SC_NPROC_CONF 0 + +/* Define to 1 if you have the declaration of `_SC_NPROC_ONLN', and to 0 if + you don't. */ +#define HAVE_DECL__SC_NPROC_ONLN 0 + +/* Define to 1 if you have the declaration of `_SC_PAGESIZE', and to 0 if you + don't. */ +#define HAVE_DECL__SC_PAGESIZE 1 + +/* Define to 1 if you have the declaration of `_SC_PAGE_SIZE', and to 0 if you + don't. */ +#define HAVE_DECL__SC_PAGE_SIZE 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_DIRENT_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_DLFCN_H 1 + +/* Define to 1 if you have the `ffs' function. */ +#define HAVE_FFS 1 + +/* Define to 1 if you have the `ffsl' function. */ +#define HAVE_FFSL 1 + +/* Define to 1 if you have the `fls' function. */ +/* #undef HAVE_FLS */ + +/* Define to 1 if you have the `flsl' function. */ +/* #undef HAVE_FLSL */ + +/* Define to 1 if you have the `getpagesize' function. */ +#define HAVE_GETPAGESIZE 1 + +/* Define to 1 if the system has the type `GROUP_AFFINITY'. */ +/* #undef HAVE_GROUP_AFFINITY */ + +/* Define to 1 if the system has the type `GROUP_RELATIONSHIP'. */ +/* #undef HAVE_GROUP_RELATIONSHIP */ + +/* Define to 1 if you have the `host_info' function. */ +/* #undef HAVE_HOST_INFO */ + +/* Define to 1 if you have the header file. */ +#define HAVE_INFINIBAND_VERBS_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_INTTYPES_H 1 + +/* Define to 1 if the system has the type `KAFFINITY'. */ +/* #undef HAVE_KAFFINITY */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_KSTAT_H */ + +/* Define to 1 if you have the header file. */ +#define HAVE_LANGINFO_H 1 + +/* Define to 1 if we have -lgdi32 */ +/* #undef HAVE_LIBGDI32 */ + +/* Define to 1 if we have -libverbs */ +#define HAVE_LIBIBVERBS 1 + +/* Define to 1 if we have -lkstat */ +/* #undef HAVE_LIBKSTAT */ + +/* Define to 1 if we have -llgrp */ +/* #undef HAVE_LIBLGRP */ + +/* Define to 1 if you have the `pci' library (-lpci). */ +/* #undef HAVE_LIBPCI */ + +/* Define to 1 if you have the header file. */ +#define HAVE_LOCALE_H 1 + +/* Define to 1 if the system has the type `LOGICAL_PROCESSOR_RELATIONSHIP'. */ +/* #undef HAVE_LOGICAL_PROCESSOR_RELATIONSHIP */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_MACH_MACH_HOST_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_MACH_MACH_INIT_H */ + +/* Define to 1 if you have the header file. */ +#define HAVE_MALLOC_H 1 + +/* Define to 1 if you have the `memalign' function. */ +#define HAVE_MEMALIGN 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_MEMORY_H 1 + +/* Define to 1 if we have -lmyriexpress */ +/* #undef HAVE_MYRIEXPRESS */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_MYRIEXPRESS_H */ + +/* Define to 1 if you have the `nl_langinfo' function. */ +#define HAVE_NL_LANGINFO 1 + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_NUMAIF_H */ + +/* Define to 1 if the system has the type `NUMA_NODE_RELATIONSHIP'. */ +/* #undef HAVE_NUMA_NODE_RELATIONSHIP */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_NVCTRL_NVCTRL_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_NVML_H */ + +/* Define to 1 if you have the `openat' function. */ +#define HAVE_OPENAT 1 + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_PCI_PCI_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_PICL_H */ + +/* Define to 1 if you have the `posix_memalign' function. */ +#define HAVE_POSIX_MEMALIGN 1 + +/* Define to 1 if the system has the type `PROCESSOR_CACHE_TYPE'. */ +/* #undef HAVE_PROCESSOR_CACHE_TYPE */ + +/* Define to 1 if the system has the type `PROCESSOR_GROUP_INFO'. */ +/* #undef HAVE_PROCESSOR_GROUP_INFO */ + +/* Define to 1 if the system has the type `PROCESSOR_RELATIONSHIP'. */ +/* #undef HAVE_PROCESSOR_RELATIONSHIP */ + +/* Define to 1 if the system has the type `PSAPI_WORKING_SET_EX_BLOCK'. */ +/* #undef HAVE_PSAPI_WORKING_SET_EX_BLOCK */ + +/* Define to 1 if the system has the type `PSAPI_WORKING_SET_EX_INFORMATION'. + */ +/* #undef HAVE_PSAPI_WORKING_SET_EX_INFORMATION */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_PTHREAD_NP_H */ + +/* Define to 1 if the system has the type `pthread_t'. */ +#define HAVE_PTHREAD_T 1 + +/* Define to 1 if you have the `putwc' function. */ +#define HAVE_PUTWC 1 + +/* Define to 1 if the system has the type `RelationProcessorPackage'. */ +/* #undef HAVE_RELATIONPROCESSORPACKAGE */ + +/* Define to 1 if you have the `setlocale' function. */ +#define HAVE_SETLOCALE 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_STDINT_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_STDLIB_H 1 + +/* Define to 1 if you have the `strftime' function. */ +#define HAVE_STRFTIME 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_STRINGS_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_STRING_H 1 + +/* Define to 1 if you have the `strncasecmp' function. */ +#define HAVE_STRNCASECMP 1 + +/* Define to '1' if sysctl is present and usable */ +#define HAVE_SYSCTL 1 + +/* Define to '1' if sysctlbyname is present and usable */ +/* #undef HAVE_SYSCTLBYNAME */ + +/* Define to 1 if the system has the type + `SYSTEM_LOGICAL_PROCESSOR_INFORMATION'. */ +/* #undef HAVE_SYSTEM_LOGICAL_PROCESSOR_INFORMATION */ + +/* Define to 1 if the system has the type + `SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX'. */ +/* #undef HAVE_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_SYS_CPUSET_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_SYS_LGRP_USER_H */ + +/* Define to 1 if you have the header file. */ +#define HAVE_SYS_MMAN_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_SYS_PARAM_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_SYS_STAT_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_SYS_SYSCTL_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_SYS_TYPES_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_SYS_UTSNAME_H 1 + +/* Define to 1 if you have the `uname' function. */ +#define HAVE_UNAME 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_UNISTD_H 1 + +/* Define to 1 if you have the `uselocale' function. */ +#define HAVE_USELOCALE 1 + +/* Define to 1 if the system has the type `wchar_t'. */ +#define HAVE_WCHAR_T 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_X11_KEYSYM_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_X11_XLIB_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_X11_XUTIL_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_XLOCALE_H 1 + +/* Define to 1 on AIX */ +/* #undef HWLOC_AIX_SYS */ + +/* Define to 1 on BlueGene/Q */ +/* #undef HWLOC_BGQ_SYS */ + +/* Whether C compiler supports symbol visibility or not */ +#define HWLOC_C_HAVE_VISIBILITY 1 + +/* Define to 1 on Darwin */ +/* #undef HWLOC_DARWIN_SYS */ + +/* Whether we are in debugging mode or not */ +/* #undef HWLOC_DEBUG */ + +/* Define to 1 on *FREEBSD */ +/* #undef HWLOC_FREEBSD_SYS */ + +/* Whether your compiler has __attribute__ or not */ +#define HWLOC_HAVE_ATTRIBUTE 1 + +/* Whether your compiler has __attribute__ aligned or not */ +#define HWLOC_HAVE_ATTRIBUTE_ALIGNED 1 + +/* Whether your compiler has __attribute__ always_inline or not */ +#define HWLOC_HAVE_ATTRIBUTE_ALWAYS_INLINE 1 + +/* Whether your compiler has __attribute__ cold or not */ +#define HWLOC_HAVE_ATTRIBUTE_COLD 1 + +/* Whether your compiler has __attribute__ const or not */ +#define HWLOC_HAVE_ATTRIBUTE_CONST 1 + +/* Whether your compiler has __attribute__ deprecated or not */ +#define HWLOC_HAVE_ATTRIBUTE_DEPRECATED 1 + +/* Whether your compiler has __attribute__ format or not */ +#define HWLOC_HAVE_ATTRIBUTE_FORMAT 1 + +/* Whether your compiler has __attribute__ hot or not */ +#define HWLOC_HAVE_ATTRIBUTE_HOT 1 + +/* Whether your compiler has __attribute__ malloc or not */ +#define HWLOC_HAVE_ATTRIBUTE_MALLOC 1 + +/* Whether your compiler has __attribute__ may_alias or not */ +#define HWLOC_HAVE_ATTRIBUTE_MAY_ALIAS 1 + +/* Whether your compiler has __attribute__ nonnull or not */ +#define HWLOC_HAVE_ATTRIBUTE_NONNULL 1 + +/* Whether your compiler has __attribute__ noreturn or not */ +#define HWLOC_HAVE_ATTRIBUTE_NORETURN 1 + +/* Whether your compiler has __attribute__ no_instrument_function or not */ +#define HWLOC_HAVE_ATTRIBUTE_NO_INSTRUMENT_FUNCTION 1 + +/* Whether your compiler has __attribute__ packed or not */ +#define HWLOC_HAVE_ATTRIBUTE_PACKED 1 + +/* Whether your compiler has __attribute__ pure or not */ +#define HWLOC_HAVE_ATTRIBUTE_PURE 1 + +/* Whether your compiler has __attribute__ sentinel or not */ +#define HWLOC_HAVE_ATTRIBUTE_SENTINEL 1 + +/* Whether your compiler has __attribute__ unused or not */ +#define HWLOC_HAVE_ATTRIBUTE_UNUSED 1 + +/* Whether your compiler has __attribute__ warn unused result or not */ +#define HWLOC_HAVE_ATTRIBUTE_WARN_UNUSED_RESULT 1 + +/* Whether your compiler has __attribute__ weak alias or not */ +#define HWLOC_HAVE_ATTRIBUTE_WEAK_ALIAS 1 + +/* Define to 1 if your `ffs' function is known to be broken. */ +/* #undef HWLOC_HAVE_BROKEN_FFS */ + +/* Define to 1 if you have the `cairo' library. */ +/* #undef HWLOC_HAVE_CAIRO */ + +/* Define to 1 if you have the `clz' function. */ +/* #undef HWLOC_HAVE_CLZ */ + +/* Define to 1 if you have the `clzl' function. */ +/* #undef HWLOC_HAVE_CLZL */ + +/* Define to 1 if you have cpuid */ +#define HWLOC_HAVE_CPUID 1 + +/* Define to 1 if the CPU_SET macro works */ +#define HWLOC_HAVE_CPU_SET 1 + +/* Define to 1 if the CPU_SET_S macro works */ +#define HWLOC_HAVE_CPU_SET_S 1 + +/* Define to 1 if you have the `cudart' SDK. */ +/* #undef HWLOC_HAVE_CUDART */ + +/* Define to 1 if function `clz' is declared by system headers */ +/* #undef HWLOC_HAVE_DECL_CLZ */ + +/* Define to 1 if function `clzl' is declared by system headers */ +/* #undef HWLOC_HAVE_DECL_CLZL */ + +/* Define to 1 if function `ffs' is declared by system headers */ +#define HWLOC_HAVE_DECL_FFS 1 + +/* Define to 1 if function `ffsl' is declared by system headers */ +#define HWLOC_HAVE_DECL_FFSL 1 + +/* Define to 1 if function `fls' is declared by system headers */ +/* #undef HWLOC_HAVE_DECL_FLS */ + +/* Define to 1 if function `flsl' is declared by system headers */ +/* #undef HWLOC_HAVE_DECL_FLSL */ + +/* Define to 1 if you have the `ffs' function. */ +#define HWLOC_HAVE_FFS 1 + +/* Define to 1 if you have the `ffsl' function. */ +#define HWLOC_HAVE_FFSL 1 + +/* Define to 1 if you have the `fls' function. */ +/* #undef HWLOC_HAVE_FLS */ + +/* Define to 1 if you have the `flsl' function. */ +/* #undef HWLOC_HAVE_FLSL */ + +/* Define to 1 if you have the GL module components. */ +/* #undef HWLOC_HAVE_GL */ + +/* Define to 1 if you have the `libpciaccess' library. */ +/* #undef HWLOC_HAVE_LIBPCIACCESS */ + +/* Define to 1 if you have a library providing the termcap interface */ +#define HWLOC_HAVE_LIBTERMCAP 1 + +/* Define to 1 if you have the `libxml2' library. */ +#define HWLOC_HAVE_LIBXML2 1 + +/* Define to 1 if building the Linux PCI component */ +#define HWLOC_HAVE_LINUXPCI 1 + +/* Define to 1 if mbind is available. */ +/* #undef HWLOC_HAVE_MBIND */ + +/* Define to 1 if migrate_pages is available. */ +/* #undef HWLOC_HAVE_MIGRATE_PAGES */ + +/* Define to 1 if you have the `NVML' library. */ +/* #undef HWLOC_HAVE_NVML */ + +/* Define to 1 if glibc provides the old prototype (without length) of + sched_setaffinity() */ +/* #undef HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ + +/* Define to 1 if you have the `OpenCL' library. */ +/* #undef HWLOC_HAVE_OPENCL */ + +/* Define to 1 if `libpci' struct pci_dev has a `device_class' field. */ +/* #undef HWLOC_HAVE_PCIDEV_DEVICE_CLASS */ + +/* Define to 1 if `libpci' struct pci_dev has a `domain' field. */ +/* #undef HWLOC_HAVE_PCIDEV_DOMAIN */ + +/* Define to 1 if you have the pciutils `libpci' library. */ +/* #undef HWLOC_HAVE_PCIUTILS */ + +/* Define to 1 if `libpci' has the `pci_find_cap' function. */ +/* #undef HWLOC_HAVE_PCI_FIND_CAP */ + +/* Define to 1 if the hwloc library should support dynamically-loaded plugins + */ +/* #undef HWLOC_HAVE_PLUGINS */ + +/* `Define to 1 if you have pthread_getthrds_np' */ +/* #undef HWLOC_HAVE_PTHREAD_GETTHRDS_NP */ + +/* Define to 1 if pthread mutexes are available */ +#define HWLOC_HAVE_PTHREAD_MUTEX 1 + +/* Define to 1 if glibc provides a prototype of sched_setaffinity() */ +#define HWLOC_HAVE_SCHED_SETAFFINITY 1 + +/* Define to 1 if set_mempolicy is available. */ +/* #undef HWLOC_HAVE_SET_MEMPOLICY */ + +/* Define to 1 if you have the header file. */ +#define HWLOC_HAVE_STDINT_H 1 + +/* Define to 1 if you have the `windows.h' header. */ +/* #undef HWLOC_HAVE_WINDOWS_H */ + +/* Define to 1 if X11 headers including Xutil.h and keysym.h are available. */ +#define HWLOC_HAVE_X11_KEYSYM 1 + +/* Define to 1 if the _syscall3 macro works */ +/* #undef HWLOC_HAVE__SYSCALL3 */ + +/* Define to 1 on HP-UX */ +/* #undef HWLOC_HPUX_SYS */ + +/* Define to 1 on Irix */ +/* #undef HWLOC_IRIX_SYS */ + +/* Define to 1 on Linux */ +#define HWLOC_LINUX_SYS 1 + +/* Define to 1 on *NETBSD */ +/* #undef HWLOC_NETBSD_SYS */ + +/* Define to 1 on OSF */ +/* #undef HWLOC_OSF_SYS */ + +/* The size of `unsigned int', as computed by sizeof */ +#define HWLOC_SIZEOF_UNSIGNED_INT 4 + +/* The size of `unsigned long', as computed by sizeof */ +#define HWLOC_SIZEOF_UNSIGNED_LONG 8 + +/* Define to 1 on Solaris */ +/* #undef HWLOC_SOLARIS_SYS */ + +/* The hwloc symbol prefix */ +#define HWLOC_SYM_PREFIX hwloc_ + +/* The hwloc symbol prefix in all caps */ +#define HWLOC_SYM_PREFIX_CAPS HWLOC_ + +/* Whether we need to re-define all the hwloc public symbols or not */ +#define HWLOC_SYM_TRANSFORM 0 + +/* Define to 1 on unsupported systems */ +/* #undef HWLOC_UNSUPPORTED_SYS */ + +/* Define to 1 if ncurses works, preferred over curses */ +#define HWLOC_USE_NCURSES 1 + +/* Define to 1 on WINDOWS */ +/* #undef HWLOC_WIN_SYS */ + +/* Define to 1 on x86_32 */ +/* #undef HWLOC_X86_32_ARCH */ + +/* Define to 1 on x86_64 */ +#define HWLOC_X86_64_ARCH 1 + +/* Define to the sub-directory in which libtool stores uninstalled libraries. + */ +#define LT_OBJDIR ".libs/" + +/* Name of package */ +#define PACKAGE "hwloc" + +/* Define to the address where bug reports for this package should be sent. */ +#define PACKAGE_BUGREPORT "http://www.open-mpi.org/projects/hwloc/" + +/* Define to the full name of this package. */ +#define PACKAGE_NAME "hwloc" + +/* Define to the full name and version of this package. */ +#define PACKAGE_STRING "hwloc 1.8.1" + +/* Define to the one symbol short name of this package. */ +#define PACKAGE_TARNAME "hwloc" + +/* Define to the home page for this package. */ +#define PACKAGE_URL "" + +/* Define to the version of this package. */ +#define PACKAGE_VERSION "1.8.1" + +/* The size of `unsigned int', as computed by sizeof. */ +#define SIZEOF_UNSIGNED_INT 4 + +/* The size of `unsigned long', as computed by sizeof. */ +#define SIZEOF_UNSIGNED_LONG 8 + +/* The size of `void *', as computed by sizeof. */ +#define SIZEOF_VOID_P 8 + +/* Define to 1 if you have the ANSI C header files. */ +#define STDC_HEADERS 1 + +/* Enable extensions on HP-UX. */ +#ifndef _HPUX_SOURCE +# define _HPUX_SOURCE 1 +#endif + + +/* Enable extensions on AIX 3, Interix. */ +#ifndef _ALL_SOURCE +# define _ALL_SOURCE 1 +#endif +/* Enable GNU extensions on systems that have them. */ +#ifndef _GNU_SOURCE +# define _GNU_SOURCE 1 +#endif +/* Enable threading extensions on Solaris. */ +#ifndef _POSIX_PTHREAD_SEMANTICS +# define _POSIX_PTHREAD_SEMANTICS 1 +#endif +/* Enable extensions on HP NonStop. */ +#ifndef _TANDEM_SOURCE +# define _TANDEM_SOURCE 1 +#endif +/* Enable general extensions on Solaris. */ +#ifndef __EXTENSIONS__ +# define __EXTENSIONS__ 1 +#endif + + +/* Version number of package */ +#define VERSION "1.8.1" + +/* Define to 1 if the X Window System is missing or not being used. */ +/* #undef X_DISPLAY_MISSING */ + +/* Are we building for HP-UX? */ +#define _HPUX_SOURCE 1 + +/* Define to 1 if on MINIX. */ +/* #undef _MINIX */ + +/* Define to 2 if the system does not provide POSIX.1 features except with + this defined. */ +/* #undef _POSIX_1_SOURCE */ + +/* Define to 1 if you need to in order for `stat' and other things to work. */ +/* #undef _POSIX_SOURCE */ + +/* Define this to the process ID type */ +#define hwloc_pid_t pid_t + +/* Define this to either strncasecmp or strncmp */ +#define hwloc_strncasecmp strncasecmp + +/* Define this to the thread ID type */ +#define hwloc_thread_t pthread_t + + +#endif /* HWLOC_CONFIGURE_H */ + diff --git a/ext/hwloc/include/private/components.h b/ext/hwloc/include/private/components.h new file mode 100644 index 000000000..b36634535 --- /dev/null +++ b/ext/hwloc/include/private/components.h @@ -0,0 +1,40 @@ +/* + * Copyright © 2012 Inria. All rights reserved. + * See COPYING in top-level directory. + */ + + +#ifdef HWLOC_INSIDE_PLUGIN +/* + * these declarations are internal only, they are not available to plugins + * (many functions below are internal static symbols). + */ +#error This file should not be used in plugins +#endif + + +#ifndef PRIVATE_COMPONENTS_H +#define PRIVATE_COMPONENTS_H 1 + +#include + +struct hwloc_topology; + +extern int hwloc_disc_component_force_enable(struct hwloc_topology *topology, + int envvar_forced, /* 1 if forced through envvar, 0 if forced through API */ + int type, const char *name, + const void *data1, const void *data2, const void *data3); +extern void hwloc_disc_components_enable_others(struct hwloc_topology *topology); + +/* Compute the topology is_thissystem flag based on enabled backends */ +extern void hwloc_backends_is_thissystem(struct hwloc_topology *topology); + +/* Disable and destroy all backends used by a topology */ +extern void hwloc_backends_disable_all(struct hwloc_topology *topology); + +/* Used by the core to setup/destroy the list of components */ +extern void hwloc_components_init(struct hwloc_topology *topology); /* increases components refcount, should be called exactly once per topology (during init) */ +extern void hwloc_components_destroy_all(struct hwloc_topology *topology); /* decreases components refcount, should be called exactly once per topology (during destroy) */ + +#endif /* PRIVATE_COMPONENTS_H */ + diff --git a/ext/hwloc/include/private/cpuid.h b/ext/hwloc/include/private/cpuid.h new file mode 100644 index 000000000..214ab3827 --- /dev/null +++ b/ext/hwloc/include/private/cpuid.h @@ -0,0 +1,80 @@ +/* + * Copyright © 2010-2012 Université Bordeaux 1 + * Copyright © 2010 Cisco Systems, Inc. All rights reserved. + * Copyright © 2014 Inria. All rights reserved. + * + * See COPYING in top-level directory. + */ + +/* Internals for x86's cpuid. */ + +#ifndef HWLOC_PRIVATE_CPUID_H +#define HWLOC_PRIVATE_CPUID_H + +#ifdef HWLOC_X86_32_ARCH +static __hwloc_inline int hwloc_have_cpuid(void) +{ + int ret; + unsigned tmp, tmp2; + asm( + "mov $0,%0\n\t" /* Not supported a priori */ + + "pushfl \n\t" /* Save flags */ + + "pushfl \n\t" \ + "pop %1 \n\t" /* Get flags */ \ + +#define TRY_TOGGLE \ + "xor $0x00200000,%1\n\t" /* Try to toggle ID */ \ + "mov %1,%2\n\t" /* Save expected value */ \ + "push %1 \n\t" \ + "popfl \n\t" /* Try to toggle */ \ + "pushfl \n\t" \ + "pop %1 \n\t" \ + "cmp %1,%2\n\t" /* Compare with expected value */ \ + "jnz Lhwloc1\n\t" /* Unexpected, failure */ \ + + TRY_TOGGLE /* Try to set/clear */ + TRY_TOGGLE /* Try to clear/set */ + + "mov $1,%0\n\t" /* Passed the test! */ + + "Lhwloc1: \n\t" + "popfl \n\t" /* Restore flags */ + + : "=r" (ret), "=&r" (tmp), "=&r" (tmp2)); + return ret; +} +#endif /* HWLOC_X86_32_ARCH */ +#ifdef HWLOC_X86_64_ARCH +static __hwloc_inline int hwloc_have_cpuid(void) { return 1; } +#endif /* HWLOC_X86_64_ARCH */ + +static __hwloc_inline void hwloc_cpuid(unsigned *eax, unsigned *ebx, unsigned *ecx, unsigned *edx) +{ + /* Note: gcc might want to use bx or the stack for %1 addressing, so we can't + * use them :/ */ +#ifdef HWLOC_X86_64_ARCH + hwloc_uint64_t sav_rbx; + asm( + "mov %%rbx,%2\n\t" + "cpuid\n\t" + "xchg %2,%%rbx\n\t" + "movl %k2,%1\n\t" + : "+a" (*eax), "=m" (*ebx), "=&r"(sav_rbx), + "+c" (*ecx), "=&d" (*edx)); +#elif defined(HWLOC_X86_32_ARCH) + unsigned long sav_ebx; + asm( + "mov %%ebx,%2\n\t" + "cpuid\n\t" + "xchg %2,%%ebx\n\t" + "movl %k2,%1\n\t" + : "+a" (*eax), "=m" (*ebx), "=&r"(sav_ebx), + "+c" (*ecx), "=&d" (*edx)); +#else +#error unknown architecture +#endif +} + +#endif /* HWLOC_PRIVATE_CPUID_H */ diff --git a/ext/hwloc/include/private/debug.h b/ext/hwloc/include/private/debug.h new file mode 100644 index 000000000..b327bf2a6 --- /dev/null +++ b/ext/hwloc/include/private/debug.h @@ -0,0 +1,57 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2012 Inria. All rights reserved. + * Copyright © 2009, 2011 Université Bordeaux 1 + * Copyright © 2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +/* The configuration file */ + +#ifndef HWLOC_DEBUG_H +#define HWLOC_DEBUG_H + +#include + +#ifdef HWLOC_DEBUG +#include +#include +#endif + +static __hwloc_inline void hwloc_debug(const char *s __hwloc_attribute_unused, ...) +{ +#ifdef HWLOC_DEBUG + va_list ap; + + va_start(ap, s); + vfprintf(stderr, s, ap); + va_end(ap); +#endif +} + +#ifdef HWLOC_DEBUG +#define hwloc_debug_bitmap(fmt, bitmap) do { \ + char *s; \ + hwloc_bitmap_asprintf(&s, bitmap); \ + fprintf(stderr, fmt, s); \ + free(s); \ +} while (0) +#define hwloc_debug_1arg_bitmap(fmt, arg1, bitmap) do { \ + char *s; \ + hwloc_bitmap_asprintf(&s, bitmap); \ + fprintf(stderr, fmt, arg1, s); \ + free(s); \ +} while (0) +#define hwloc_debug_2args_bitmap(fmt, arg1, arg2, bitmap) do { \ + char *s; \ + hwloc_bitmap_asprintf(&s, bitmap); \ + fprintf(stderr, fmt, arg1, arg2, s); \ + free(s); \ +} while (0) +#else +#define hwloc_debug_bitmap(s, bitmap) do { } while(0) +#define hwloc_debug_1arg_bitmap(s, arg1, bitmap) do { } while(0) +#define hwloc_debug_2args_bitmap(s, arg1, arg2, bitmap) do { } while(0) +#endif + +#endif /* HWLOC_DEBUG_H */ diff --git a/ext/hwloc/include/private/misc.h b/ext/hwloc/include/private/misc.h new file mode 100644 index 000000000..3f4c95c33 --- /dev/null +++ b/ext/hwloc/include/private/misc.h @@ -0,0 +1,357 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2010 inria. All rights reserved. + * Copyright © 2009-2012 Université Bordeaux 1 + * Copyright © 2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +/* Misc macros and inlines. */ + +#ifndef HWLOC_PRIVATE_MISC_H +#define HWLOC_PRIVATE_MISC_H + +#include +#include + +/* Compile-time assertion */ +#define HWLOC_BUILD_ASSERT(condition) ((void)sizeof(char[1 - 2*!(condition)])) + +#define HWLOC_BITS_PER_LONG (HWLOC_SIZEOF_UNSIGNED_LONG * 8) +#define HWLOC_BITS_PER_INT (HWLOC_SIZEOF_UNSIGNED_INT * 8) + +#if (HWLOC_BITS_PER_LONG != 32) && (HWLOC_BITS_PER_LONG != 64) +#error "unknown size for unsigned long." +#endif + +#if (HWLOC_BITS_PER_INT != 16) && (HWLOC_BITS_PER_INT != 32) && (HWLOC_BITS_PER_INT != 64) +#error "unknown size for unsigned int." +#endif + + +/** + * ffsl helpers. + */ + +#if defined(HWLOC_HAVE_BROKEN_FFS) + +/* System has a broken ffs(). + * We must check the before __GNUC__ or HWLOC_HAVE_FFSL + */ +# define HWLOC_NO_FFS + +#elif defined(__GNUC__) + +# if (__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 4)) + /* Starting from 3.4, gcc has a long variant. */ +# define hwloc_ffsl(x) __builtin_ffsl(x) +# else +# define hwloc_ffs(x) __builtin_ffs(x) +# define HWLOC_NEED_FFSL +# endif + +#elif defined(HWLOC_HAVE_FFSL) + +# ifndef HWLOC_HAVE_DECL_FFSL +extern int ffsl(long) __hwloc_attribute_const; +# endif + +# define hwloc_ffsl(x) ffsl(x) + +#elif defined(HWLOC_HAVE_FFS) + +# ifndef HWLOC_HAVE_DECL_FFS +extern int ffs(int) __hwloc_attribute_const; +# endif + +# define hwloc_ffs(x) ffs(x) +# define HWLOC_NEED_FFSL + +#else /* no ffs implementation */ + +# define HWLOC_NO_FFS + +#endif + +#ifdef HWLOC_NO_FFS + +/* no ffs or it is known to be broken */ +static __hwloc_inline int +hwloc_ffsl_manual(unsigned long x) __hwloc_attribute_const; +static __hwloc_inline int +hwloc_ffsl_manual(unsigned long x) +{ + int i; + + if (!x) + return 0; + + i = 1; +#if HWLOC_BITS_PER_LONG >= 64 + if (!(x & 0xfffffffful)) { + x >>= 32; + i += 32; + } +#endif + if (!(x & 0xffffu)) { + x >>= 16; + i += 16; + } + if (!(x & 0xff)) { + x >>= 8; + i += 8; + } + if (!(x & 0xf)) { + x >>= 4; + i += 4; + } + if (!(x & 0x3)) { + x >>= 2; + i += 2; + } + if (!(x & 0x1)) { + x >>= 1; + i += 1; + } + + return i; +} +/* always define hwloc_ffsl as a macro, to avoid renaming breakage */ +#define hwloc_ffsl hwloc_ffsl_manual + +#elif defined(HWLOC_NEED_FFSL) + +/* We only have an int ffs(int) implementation, build a long one. */ + +/* First make it 32 bits if it was only 16. */ +static __hwloc_inline int +hwloc_ffs32(unsigned long x) __hwloc_attribute_const; +static __hwloc_inline int +hwloc_ffs32(unsigned long x) +{ +#if HWLOC_BITS_PER_INT == 16 + int low_ffs, hi_ffs; + + low_ffs = hwloc_ffs(x & 0xfffful); + if (low_ffs) + return low_ffs; + + hi_ffs = hwloc_ffs(x >> 16); + if (hi_ffs) + return hi_ffs + 16; + + return 0; +#else + return hwloc_ffs(x); +#endif +} + +/* Then make it 64 bit if longs are. */ +static __hwloc_inline int +hwloc_ffsl_from_ffs32(unsigned long x) __hwloc_attribute_const; +static __hwloc_inline int +hwloc_ffsl_from_ffs32(unsigned long x) +{ +#if HWLOC_BITS_PER_LONG == 64 + int low_ffs, hi_ffs; + + low_ffs = hwloc_ffs32(x & 0xfffffffful); + if (low_ffs) + return low_ffs; + + hi_ffs = hwloc_ffs32(x >> 32); + if (hi_ffs) + return hi_ffs + 32; + + return 0; +#else + return hwloc_ffs32(x); +#endif +} +/* always define hwloc_ffsl as a macro, to avoid renaming breakage */ +#define hwloc_ffsl hwloc_ffsl_from_ffs32 + +#endif + +/** + * flsl helpers. + */ +#ifdef __GNUC_____ + +# if (__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 4)) +# define hwloc_flsl(x) (x ? 8*sizeof(long) - __builtin_clzl(x) : 0) +# else +# define hwloc_fls(x) (x ? 8*sizeof(int) - __builtin_clz(x) : 0) +# define HWLOC_NEED_FLSL +# endif + +#elif defined(HWLOC_HAVE_FLSL) + +# ifndef HWLOC_HAVE_DECL_FLSL +extern int flsl(long) __hwloc_attribute_const; +# endif + +# define hwloc_flsl(x) flsl(x) + +#elif defined(HWLOC_HAVE_CLZL) + +# ifndef HWLOC_HAVE_DECL_CLZL +extern int clzl(long) __hwloc_attribute_const; +# endif + +# define hwloc_flsl(x) (x ? 8*sizeof(long) - clzl(x) : 0) + +#elif defined(HWLOC_HAVE_FLS) + +# ifndef HWLOC_HAVE_DECL_FLS +extern int fls(int) __hwloc_attribute_const; +# endif + +# define hwloc_fls(x) fls(x) +# define HWLOC_NEED_FLSL + +#elif defined(HWLOC_HAVE_CLZ) + +# ifndef HWLOC_HAVE_DECL_CLZ +extern int clz(int) __hwloc_attribute_const; +# endif + +# define hwloc_fls(x) (x ? 8*sizeof(int) - clz(x) : 0) +# define HWLOC_NEED_FLSL + +#else /* no fls implementation */ + +static __hwloc_inline int +hwloc_flsl_manual(unsigned long x) __hwloc_attribute_const; +static __hwloc_inline int +hwloc_flsl_manual(unsigned long x) +{ + int i = 0; + + if (!x) + return 0; + + i = 1; +#if HWLOC_BITS_PER_LONG >= 64 + if ((x & 0xffffffff00000000ul)) { + x >>= 32; + i += 32; + } +#endif + if ((x & 0xffff0000u)) { + x >>= 16; + i += 16; + } + if ((x & 0xff00)) { + x >>= 8; + i += 8; + } + if ((x & 0xf0)) { + x >>= 4; + i += 4; + } + if ((x & 0xc)) { + x >>= 2; + i += 2; + } + if ((x & 0x2)) { + x >>= 1; + i += 1; + } + + return i; +} +/* always define hwloc_flsl as a macro, to avoid renaming breakage */ +#define hwloc_flsl hwloc_flsl_manual + +#endif + +#ifdef HWLOC_NEED_FLSL + +/* We only have an int fls(int) implementation, build a long one. */ + +/* First make it 32 bits if it was only 16. */ +static __hwloc_inline int +hwloc_fls32(unsigned long x) __hwloc_attribute_const; +static __hwloc_inline int +hwloc_fls32(unsigned long x) +{ +#if HWLOC_BITS_PER_INT == 16 + int low_fls, hi_fls; + + hi_fls = hwloc_fls(x >> 16); + if (hi_fls) + return hi_fls + 16; + + low_fls = hwloc_fls(x & 0xfffful); + if (low_fls) + return low_fls; + + return 0; +#else + return hwloc_fls(x); +#endif +} + +/* Then make it 64 bit if longs are. */ +static __hwloc_inline int +hwloc_flsl_from_fls32(unsigned long x) __hwloc_attribute_const; +static __hwloc_inline int +hwloc_flsl_from_fls32(unsigned long x) +{ +#if HWLOC_BITS_PER_LONG == 64 + int low_fls, hi_fls; + + hi_fls = hwloc_fls32(x >> 32); + if (hi_fls) + return hi_fls + 32; + + low_fls = hwloc_fls32(x & 0xfffffffful); + if (low_fls) + return low_fls; + + return 0; +#else + return hwloc_fls32(x); +#endif +} +/* always define hwloc_flsl as a macro, to avoid renaming breakage */ +#define hwloc_flsl hwloc_flsl_from_fls32 + +#endif + +static __hwloc_inline int +hwloc_weight_long(unsigned long w) __hwloc_attribute_const; +static __hwloc_inline int +hwloc_weight_long(unsigned long w) +{ +#if HWLOC_BITS_PER_LONG == 32 +#if (__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__) >= 4) + return __builtin_popcount(w); +#else + unsigned int res = (w & 0x55555555) + ((w >> 1) & 0x55555555); + res = (res & 0x33333333) + ((res >> 2) & 0x33333333); + res = (res & 0x0F0F0F0F) + ((res >> 4) & 0x0F0F0F0F); + res = (res & 0x00FF00FF) + ((res >> 8) & 0x00FF00FF); + return (res & 0x0000FFFF) + ((res >> 16) & 0x0000FFFF); +#endif +#else /* HWLOC_BITS_PER_LONG == 32 */ +#if (__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__) >= 4) + return __builtin_popcountll(w); +#else + unsigned long res; + res = (w & 0x5555555555555555ul) + ((w >> 1) & 0x5555555555555555ul); + res = (res & 0x3333333333333333ul) + ((res >> 2) & 0x3333333333333333ul); + res = (res & 0x0F0F0F0F0F0F0F0Ful) + ((res >> 4) & 0x0F0F0F0F0F0F0F0Ful); + res = (res & 0x00FF00FF00FF00FFul) + ((res >> 8) & 0x00FF00FF00FF00FFul); + res = (res & 0x0000FFFF0000FFFFul) + ((res >> 16) & 0x0000FFFF0000FFFFul); + return (res & 0x00000000FFFFFFFFul) + ((res >> 32) & 0x00000000FFFFFFFFul); +#endif +#endif /* HWLOC_BITS_PER_LONG == 64 */ +} + +#if !HAVE_DECL_STRTOULL +unsigned long long int strtoull(const char *nptr, char **endptr, int base); +#endif + +#endif /* HWLOC_PRIVATE_MISC_H */ diff --git a/ext/hwloc/include/private/private.h b/ext/hwloc/include/private/private.h new file mode 100644 index 000000000..5e684b0d6 --- /dev/null +++ b/ext/hwloc/include/private/private.h @@ -0,0 +1,300 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2014 Inria. All rights reserved. + * Copyright © 2009-2012 Université Bordeaux 1 + * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. + * + * See COPYING in top-level directory. + */ + +/* Internal types and helpers. */ + + +#ifdef HWLOC_INSIDE_PLUGIN +/* + * these declarations are internal only, they are not available to plugins + * (many functions below are internal static symbols). + */ +#error This file should not be used in plugins +#endif + + +#ifndef HWLOC_PRIVATE_H +#define HWLOC_PRIVATE_H + +#include +#include +#include +#include +#include +#include +#ifdef HAVE_UNISTD_H +#include +#endif +#ifdef HAVE_STDINT_H +#include +#endif +#ifdef HAVE_SYS_UTSNAME_H +#include +#endif +#include + +enum hwloc_ignore_type_e { + HWLOC_IGNORE_TYPE_NEVER = 0, + HWLOC_IGNORE_TYPE_KEEP_STRUCTURE, + HWLOC_IGNORE_TYPE_ALWAYS +}; + +#define HWLOC_DEPTH_MAX 128 + +struct hwloc_topology { + unsigned nb_levels; /* Number of horizontal levels */ + unsigned next_group_depth; /* Depth of the next Group object that we may create */ + unsigned level_nbobjects[HWLOC_DEPTH_MAX]; /* Number of objects on each horizontal level */ + struct hwloc_obj **levels[HWLOC_DEPTH_MAX]; /* Direct access to levels, levels[l = 0 .. nblevels-1][0..level_nbobjects[l]] */ + unsigned long flags; + int type_depth[HWLOC_OBJ_TYPE_MAX]; + enum hwloc_ignore_type_e ignored_types[HWLOC_OBJ_TYPE_MAX]; + int is_thissystem; + int is_loaded; + hwloc_pid_t pid; /* Process ID the topology is view from, 0 for self */ + + unsigned bridge_nbobjects; + struct hwloc_obj **bridge_level; + struct hwloc_obj *first_bridge, *last_bridge; + unsigned pcidev_nbobjects; + struct hwloc_obj **pcidev_level; + struct hwloc_obj *first_pcidev, *last_pcidev; + unsigned osdev_nbobjects; + struct hwloc_obj **osdev_level; + struct hwloc_obj *first_osdev, *last_osdev; + + struct hwloc_binding_hooks { + int (*set_thisproc_cpubind)(hwloc_topology_t topology, hwloc_const_cpuset_t set, int flags); + int (*get_thisproc_cpubind)(hwloc_topology_t topology, hwloc_cpuset_t set, int flags); + int (*set_thisthread_cpubind)(hwloc_topology_t topology, hwloc_const_cpuset_t set, int flags); + int (*get_thisthread_cpubind)(hwloc_topology_t topology, hwloc_cpuset_t set, int flags); + int (*set_proc_cpubind)(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_cpuset_t set, int flags); + int (*get_proc_cpubind)(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_cpuset_t set, int flags); +#ifdef hwloc_thread_t + int (*set_thread_cpubind)(hwloc_topology_t topology, hwloc_thread_t tid, hwloc_const_cpuset_t set, int flags); + int (*get_thread_cpubind)(hwloc_topology_t topology, hwloc_thread_t tid, hwloc_cpuset_t set, int flags); +#endif + + int (*get_thisproc_last_cpu_location)(hwloc_topology_t topology, hwloc_cpuset_t set, int flags); + int (*get_thisthread_last_cpu_location)(hwloc_topology_t topology, hwloc_cpuset_t set, int flags); + int (*get_proc_last_cpu_location)(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_cpuset_t set, int flags); + + int (*set_thisproc_membind)(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags); + int (*get_thisproc_membind)(hwloc_topology_t topology, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags); + int (*set_thisthread_membind)(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags); + int (*get_thisthread_membind)(hwloc_topology_t topology, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags); + int (*set_proc_membind)(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags); + int (*get_proc_membind)(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags); + int (*set_area_membind)(hwloc_topology_t topology, const void *addr, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags); + int (*get_area_membind)(hwloc_topology_t topology, const void *addr, size_t len, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags); + /* This has to return the same kind of pointer as alloc_membind, so that free_membind can be used on it */ + void *(*alloc)(hwloc_topology_t topology, size_t len); + /* alloc_membind has to always succeed if !(flags & HWLOC_MEMBIND_STRICT). + * see hwloc_alloc_or_fail which is convenient for that. */ + void *(*alloc_membind)(hwloc_topology_t topology, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags); + int (*free_membind)(hwloc_topology_t topology, void *addr, size_t len); + } binding_hooks; + + struct hwloc_topology_support support; + + void (*userdata_export_cb)(void *reserved, struct hwloc_topology *topology, struct hwloc_obj *obj); + void (*userdata_import_cb)(struct hwloc_topology *topology, struct hwloc_obj *obj, const char *name, const void *buffer, size_t length); + + struct hwloc_os_distances_s { + hwloc_obj_type_t type; + int nbobjs; + unsigned *indexes; /* array of OS indexes before we can convert them into objs. always available. + */ + struct hwloc_obj **objs; /* array of objects, in the same order as above. + * either given (by a backend) together with the indexes array above. + * or build from the above indexes array when not given (by the user). + */ + float *distances; /* distance matrices, ordered according to the above indexes/objs array. + * distance from i to j is stored in slot i*nbnodes+j. + * will be copied into the main logical-index-ordered distance at the end of the discovery. + */ + int forced; /* set if the user forced a matrix to ignore the OS one */ + + struct hwloc_os_distances_s *prev, *next; + } *first_osdist, *last_osdist; + + /* list of enabled backends. */ + struct hwloc_backend * backends; +}; + +extern void hwloc_alloc_obj_cpusets(hwloc_obj_t obj); +extern void hwloc_setup_pu_level(struct hwloc_topology *topology, unsigned nb_pus); +extern int hwloc_get_sysctlbyname(const char *name, int64_t *n); +extern int hwloc_get_sysctl(int name[], unsigned namelen, int *n); +extern unsigned hwloc_fallback_nbprocessors(struct hwloc_topology *topology); +extern void hwloc_connect_children(hwloc_obj_t obj); +extern int hwloc_connect_levels(hwloc_topology_t topology); + +extern void hwloc_topology_setup_defaults(struct hwloc_topology *topology); +extern void hwloc_topology_clear(struct hwloc_topology *topology); + +/* set native OS binding hooks */ +extern void hwloc_set_native_binding_hooks(struct hwloc_binding_hooks *hooks, struct hwloc_topology_support *support); +/* set either native OS binding hooks (if thissystem), or dummy ones */ +extern void hwloc_set_binding_hooks(struct hwloc_topology *topology); + +#if defined(HWLOC_LINUX_SYS) +extern void hwloc_set_linuxfs_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); +#endif /* HWLOC_LINUX_SYS */ + +#if defined(HWLOC_BGQ_SYS) +extern void hwloc_set_bgq_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); +#endif /* HWLOC_BGQ_SYS */ + +#ifdef HWLOC_SOLARIS_SYS +extern void hwloc_set_solaris_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); +#endif /* HWLOC_SOLARIS_SYS */ + +#ifdef HWLOC_AIX_SYS +extern void hwloc_set_aix_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); +#endif /* HWLOC_AIX_SYS */ + +#ifdef HWLOC_OSF_SYS +extern void hwloc_set_osf_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); +#endif /* HWLOC_OSF_SYS */ + +#ifdef HWLOC_WIN_SYS +extern void hwloc_set_windows_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); +#endif /* HWLOC_WIN_SYS */ + +#ifdef HWLOC_DARWIN_SYS +extern void hwloc_set_darwin_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); +#endif /* HWLOC_DARWIN_SYS */ + +#ifdef HWLOC_FREEBSD_SYS +extern void hwloc_set_freebsd_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); +#endif /* HWLOC_FREEBSD_SYS */ + +#ifdef HWLOC_NETBSD_SYS +extern void hwloc_set_netbsd_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); +#endif /* HWLOC_NETBSD_SYS */ + +#ifdef HWLOC_HPUX_SYS +extern void hwloc_set_hpux_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); +#endif /* HWLOC_HPUX_SYS */ + +/* Insert uname-specific names/values in the object infos array */ +extern void hwloc_add_uname_info(struct hwloc_topology *topology); + +/* Free obj and its attributes assuming it doesn't have any children/parent anymore */ +extern void hwloc_free_unlinked_object(hwloc_obj_t obj); + +/* Duplicate src and its children under newparent in newtopology */ +extern void hwloc__duplicate_objects(struct hwloc_topology *newtopology, struct hwloc_obj *newparent, struct hwloc_obj *src); + +/* This can be used for the alloc field to get allocated data that can be freed by free() */ +void *hwloc_alloc_heap(hwloc_topology_t topology, size_t len); + +/* This can be used for the alloc field to get allocated data that can be freed by munmap() */ +void *hwloc_alloc_mmap(hwloc_topology_t topology, size_t len); + +/* This can be used for the free_membind field to free data using free() */ +int hwloc_free_heap(hwloc_topology_t topology, void *addr, size_t len); + +/* This can be used for the free_membind field to free data using munmap() */ +int hwloc_free_mmap(hwloc_topology_t topology, void *addr, size_t len); + +/* Allocates unbound memory or fail, depending on whether STRICT is requested + * or not */ +static __hwloc_inline void * +hwloc_alloc_or_fail(hwloc_topology_t topology, size_t len, int flags) +{ + if (flags & HWLOC_MEMBIND_STRICT) + return NULL; + return hwloc_alloc(topology, len); +} + +extern void hwloc_distances_init(struct hwloc_topology *topology); +extern void hwloc_distances_destroy(struct hwloc_topology *topology); +extern void hwloc_distances_set(struct hwloc_topology *topology, hwloc_obj_type_t type, unsigned nbobjs, unsigned *indexes, hwloc_obj_t *objs, float *distances, int force); +extern void hwloc_distances_set_from_env(struct hwloc_topology *topology); +extern void hwloc_distances_restrict_os(struct hwloc_topology *topology); +extern void hwloc_distances_restrict(struct hwloc_topology *topology, unsigned long flags); +extern void hwloc_distances_finalize_os(struct hwloc_topology *topology); +extern void hwloc_distances_finalize_logical(struct hwloc_topology *topology); +extern void hwloc_clear_object_distances(struct hwloc_obj *obj); +extern void hwloc_clear_object_distances_one(struct hwloc_distances_s *distances); +extern void hwloc_group_by_distances(struct hwloc_topology *topology); + +#ifdef HAVE_USELOCALE +#include "locale.h" +#ifdef HAVE_XLOCALE_H +#include "xlocale.h" +#endif +#define hwloc_localeswitch_declare locale_t __old_locale = (locale_t)0, __new_locale +#define hwloc_localeswitch_init() do { \ + __new_locale = newlocale(LC_ALL_MASK, "C", (locale_t)0); \ + if (__new_locale != (locale_t)0) \ + __old_locale = uselocale(__new_locale); \ +} while (0) +#define hwloc_localeswitch_fini() do { \ + if (__new_locale != (locale_t)0) { \ + uselocale(__old_locale); \ + freelocale(__new_locale); \ + } \ +} while(0) +#else /* HAVE_USELOCALE */ +#define hwloc_localeswitch_declare int __dummy_nolocale __hwloc_attribute_unused +#define hwloc_localeswitch_init() +#define hwloc_localeswitch_fini() +#endif /* HAVE_USELOCALE */ + +#if !HAVE_DECL_FABSF +#define fabsf(f) fabs((double)(f)) +#endif + +#if HAVE_DECL__SC_PAGE_SIZE +#define hwloc_getpagesize() sysconf(_SC_PAGE_SIZE) +#elif HAVE_DECL__SC_PAGESIZE +#define hwloc_getpagesize() sysconf(_SC_PAGESIZE) +#elif defined HAVE_GETPAGESIZE +#define hwloc_getpagesize() getpagesize() +#else +#undef hwloc_getpagesize +#endif + +/* encode src buffer into target buffer. + * targsize must be at least 4*((srclength+2)/3)+1. + * target will be 0-terminated. + */ +extern int hwloc_encode_to_base64(const char *src, size_t srclength, char *target, size_t targsize); +/* decode src buffer into target buffer. + * src is 0-terminated. + * targsize must be at least srclength*3/4+1 (srclength not including \0) + * but only srclength*3/4 characters will be meaningful + * (the next one may be partially written during decoding, but it should be ignored). + */ +extern int hwloc_decode_from_base64(char const *src, char *target, size_t targsize); + +/* Check whether needle matches the beginning of haystack, at least n, and up + * to a colon or \0 */ +extern int hwloc_namecoloncmp(const char *haystack, const char *needle, size_t n); + +#ifdef HWLOC_HAVE_ATTRIBUTE_FORMAT +# if HWLOC_HAVE_ATTRIBUTE_FORMAT +# define __hwloc_attribute_format(type, str, arg) __attribute__((__format__(type, str, arg))) +# else +# define __hwloc_attribute_format(type, str, arg) +# endif +#else +# define __hwloc_attribute_format(type, str, arg) +#endif + +/* On some systems, snprintf returns the size of written data, not the actually + * required size. hwloc_snprintf always report the actually required size. */ +extern int hwloc_snprintf(char *str, size_t size, const char *format, ...) __hwloc_attribute_format(printf, 3, 4); + +extern void hwloc_obj_add_info_nodup(hwloc_obj_t obj, const char *name, const char *value, int nodup); + +#endif /* HWLOC_PRIVATE_H */ diff --git a/ext/hwloc/include/private/solaris-chiptype.h b/ext/hwloc/include/private/solaris-chiptype.h new file mode 100644 index 000000000..b84555b3f --- /dev/null +++ b/ext/hwloc/include/private/solaris-chiptype.h @@ -0,0 +1,59 @@ +/* + * Copyright (c) 2009-2010 Oracle and/or its affiliates. All rights reserved. + * + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + + +#ifdef HWLOC_INSIDE_PLUGIN +/* + * these declarations are internal only, they are not available to plugins + * (functions below are internal static symbols). + */ +#error This file should not be used in plugins +#endif + + +#ifndef HWLOC_PRIVATE_SOLARIS_CHIPTYPE_H +#define HWLOC_PRIVATE_SOLARIS_CHIPTYPE_H + +/* SPARC Chip Modes. */ +#define MODE_UNKNOWN 0 +#define MODE_SPITFIRE 1 +#define MODE_BLACKBIRD 2 +#define MODE_CHEETAH 3 +#define MODE_SPARC64_VI 4 +#define MODE_T1 5 +#define MODE_T2 6 +#define MODE_SPARC64_VII 7 +#define MODE_ROCK 8 + +/* SPARC Chip Implementations. */ +#define IMPL_SPARC64_VI 0x6 +#define IMPL_SPARC64_VII 0x7 +#define IMPL_SPITFIRE 0x10 +#define IMPL_BLACKBIRD 0x11 +#define IMPL_SABRE 0x12 +#define IMPL_HUMMINGBIRD 0x13 +#define IMPL_CHEETAH 0x14 +#define IMPL_CHEETAHPLUS 0x15 +#define IMPL_JALAPENO 0x16 +#define IMPL_JAGUAR 0x18 +#define IMPL_PANTHER 0x19 +#define IMPL_NIAGARA 0x23 +#define IMPL_NIAGARA_2 0x24 +#define IMPL_ROCK 0x25 + +/* Default Mfg, Cache, Speed settings */ +#define TI_MANUFACTURER 0x17 +#define TWO_MEG_CACHE 2097152 +#define SPITFIRE_SPEED 142943750 + +char* hwloc_solaris_get_chip_type(void); +char* hwloc_solaris_get_chip_model(void); + +#endif /* HWLOC_PRIVATE_SOLARIS_CHIPTYPE_H */ diff --git a/ext/hwloc/include/private/xml.h b/ext/hwloc/include/private/xml.h new file mode 100644 index 000000000..fa59050f1 --- /dev/null +++ b/ext/hwloc/include/private/xml.h @@ -0,0 +1,86 @@ +/* + * Copyright © 2009-2013 Inria. All rights reserved. + * See COPYING in top-level directory. + */ + +#ifndef PRIVATE_XML_H +#define PRIVATE_XML_H 1 + +#include + +#include + +HWLOC_DECLSPEC int hwloc__xml_verbose(void); + +typedef struct hwloc__xml_import_state_s { + struct hwloc__xml_import_state_s *parent; + + int (*next_attr)(struct hwloc__xml_import_state_s * state, char **namep, char **valuep); + int (*find_child)(struct hwloc__xml_import_state_s * state, struct hwloc__xml_import_state_s * childstate, char **tagp); + int (*close_tag)(struct hwloc__xml_import_state_s * state); /* look for an explicit closing tag */ + void (*close_child)(struct hwloc__xml_import_state_s * state); + int (*get_content)(struct hwloc__xml_import_state_s * state, char **beginp, size_t expected_length); + void (*close_content)(struct hwloc__xml_import_state_s * state); + + /* opaque data used to store backend-specific data. + * statically allocated to allow stack-allocation by the common code without knowing actual backend needs. + */ + char data[32]; +} * hwloc__xml_import_state_t; + +HWLOC_DECLSPEC int hwloc__xml_import_diff(hwloc__xml_import_state_t state, hwloc_topology_diff_t *firstdiffp); + +struct hwloc_xml_backend_data_s { + /* xml backend parameters */ + int (*look_init)(struct hwloc_xml_backend_data_s *bdata, struct hwloc__xml_import_state_s *state); + void (*look_failed)(struct hwloc_xml_backend_data_s *bdata); + void (*backend_exit)(struct hwloc_xml_backend_data_s *bdata); + void *data; /* libxml2 doc, or nolibxml buffer */ + struct hwloc_xml_imported_distances_s { + hwloc_obj_t root; + struct hwloc_distances_s distances; + struct hwloc_xml_imported_distances_s *prev, *next; + } *first_distances, *last_distances; +}; + +typedef struct hwloc__xml_export_state_s { + struct hwloc__xml_export_state_s *parent; + + void (*new_child)(struct hwloc__xml_export_state_s *parentstate, struct hwloc__xml_export_state_s *state, const char *name); + void (*new_prop)(struct hwloc__xml_export_state_s *state, const char *name, const char *value); + void (*add_content)(struct hwloc__xml_export_state_s *state, const char *buffer, size_t length); + void (*end_object)(struct hwloc__xml_export_state_s *state, const char *name); + + /* opaque data used to store backend-specific data. + * statically allocated to allow stack-allocation by the common code without knowing actual backend needs. + */ + char data[40]; +} * hwloc__xml_export_state_t; + +HWLOC_DECLSPEC void hwloc__xml_export_object (hwloc__xml_export_state_t state, struct hwloc_topology *topology, struct hwloc_obj *obj); + +HWLOC_DECLSPEC void hwloc__xml_export_diff(hwloc__xml_export_state_t parentstate, hwloc_topology_diff_t diff); + +/****************** + * XML components * + ******************/ + +struct hwloc_xml_callbacks { + int (*backend_init)(struct hwloc_xml_backend_data_s *bdata, const char *xmlpath, const char *xmlbuffer, int xmlbuflen); + int (*export_file)(struct hwloc_topology *topology, const char *filename); + int (*export_buffer)(struct hwloc_topology *topology, char **xmlbuffer, int *buflen); + void (*free_buffer)(void *xmlbuffer); + int (*import_diff)(const char *xmlpath, const char *xmlbuffer, int xmlbuflen, hwloc_topology_diff_t *diff, char **refnamep); + int (*export_diff_file)(union hwloc_topology_diff_u *diff, const char *refname, const char *filename); + int (*export_diff_buffer)(union hwloc_topology_diff_u *diff, const char *refname, char **xmlbuffer, int *buflen); +}; + +struct hwloc_xml_component { + struct hwloc_xml_callbacks *nolibxml_callbacks; + struct hwloc_xml_callbacks *libxml_callbacks; +}; + +HWLOC_DECLSPEC void hwloc_xml_callbacks_register(struct hwloc_xml_component *component); +HWLOC_DECLSPEC void hwloc_xml_callbacks_reset(void); + +#endif /* PRIVATE_XML_H */ diff --git a/ext/hwloc/include/static-components.h b/ext/hwloc/include/static-components.h new file mode 100644 index 000000000..6688fcd3b --- /dev/null +++ b/ext/hwloc/include/static-components.h @@ -0,0 +1,21 @@ +HWLOC_DECLSPEC extern const struct hwloc_component hwloc_noos_component; +//HWLOC_DECLSPEC extern const struct hwloc_component hwloc_xml_component; +HWLOC_DECLSPEC extern const struct hwloc_component hwloc_synthetic_component; +HWLOC_DECLSPEC extern const struct hwloc_component hwloc_custom_component; +//HWLOC_DECLSPEC extern const struct hwloc_component hwloc_xml_nolibxml_component; +HWLOC_DECLSPEC extern const struct hwloc_component hwloc_linux_component; +HWLOC_DECLSPEC extern const struct hwloc_component hwloc_linuxpci_component; +//HWLOC_DECLSPEC extern const struct hwloc_component hwloc_xml_libxml_component; +HWLOC_DECLSPEC extern const struct hwloc_component hwloc_x86_component; +static const struct hwloc_component * hwloc_static_components[] = { + &hwloc_noos_component, + //&hwloc_xml_component, + &hwloc_synthetic_component, + &hwloc_custom_component, + //&hwloc_xml_nolibxml_component, + &hwloc_linux_component, + &hwloc_linuxpci_component, + //&hwloc_xml_libxml_component, + &hwloc_x86_component, + NULL +}; diff --git a/ext/hwloc/src/base64.c b/ext/hwloc/src/base64.c new file mode 100644 index 000000000..89cd00315 --- /dev/null +++ b/ext/hwloc/src/base64.c @@ -0,0 +1,306 @@ +/* + * Copyright © 2012 Inria. All rights reserved. + * See COPYING in top-level directory. + * + * Modifications after import: + * - removed all #if + * - updated prototypes + * - updated #include + */ + +/* $OpenBSD: base64.c,v 1.5 2006/10/21 09:55:03 otto Exp $ */ + +/* + * Copyright (c) 1996 by Internet Software Consortium. + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND INTERNET SOFTWARE CONSORTIUM DISCLAIMS + * ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL INTERNET SOFTWARE + * CONSORTIUM BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL + * DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR + * PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS + * ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS + * SOFTWARE. + */ + +/* + * Portions Copyright (c) 1995 by International Business Machines, Inc. + * + * International Business Machines, Inc. (hereinafter called IBM) grants + * permission under its copyrights to use, copy, modify, and distribute this + * Software with or without fee, provided that the above copyright notice and + * all paragraphs of this notice appear in all copies, and that the name of IBM + * not be used in connection with the marketing of any product incorporating + * the Software or modifications thereof, without specific, written prior + * permission. + * + * To the extent it has a right to do so, IBM grants an immunity from suit + * under its patents, if any, for the use, sale or manufacture of products to + * the extent that such products are used for performing Domain Name System + * dynamic updates in TCP/IP networks by means of the Software. No immunity is + * granted for any product per se or for any other function of any product. + * + * THE SOFTWARE IS PROVIDED "AS IS", AND IBM DISCLAIMS ALL WARRANTIES, + * INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE. IN NO EVENT SHALL IBM BE LIABLE FOR ANY SPECIAL, + * DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING + * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE, EVEN + * IF IBM IS APPRISED OF THE POSSIBILITY OF SUCH DAMAGES. + */ + +/* OPENBSD ORIGINAL: lib/libc/net/base64.c */ + +static const char Base64[] = + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; +static const char Pad64 = '='; + +/* (From RFC1521 and draft-ietf-dnssec-secext-03.txt) + The following encoding technique is taken from RFC 1521 by Borenstein + and Freed. It is reproduced here in a slightly edited form for + convenience. + + A 65-character subset of US-ASCII is used, enabling 6 bits to be + represented per printable character. (The extra 65th character, "=", + is used to signify a special processing function.) + + The encoding process represents 24-bit groups of input bits as output + strings of 4 encoded characters. Proceeding from left to right, a + 24-bit input group is formed by concatenating 3 8-bit input groups. + These 24 bits are then treated as 4 concatenated 6-bit groups, each + of which is translated into a single digit in the base64 alphabet. + + Each 6-bit group is used as an index into an array of 64 printable + characters. The character referenced by the index is placed in the + output string. + + Table 1: The Base64 Alphabet + + Value Encoding Value Encoding Value Encoding Value Encoding + 0 A 17 R 34 i 51 z + 1 B 18 S 35 j 52 0 + 2 C 19 T 36 k 53 1 + 3 D 20 U 37 l 54 2 + 4 E 21 V 38 m 55 3 + 5 F 22 W 39 n 56 4 + 6 G 23 X 40 o 57 5 + 7 H 24 Y 41 p 58 6 + 8 I 25 Z 42 q 59 7 + 9 J 26 a 43 r 60 8 + 10 K 27 b 44 s 61 9 + 11 L 28 c 45 t 62 + + 12 M 29 d 46 u 63 / + 13 N 30 e 47 v + 14 O 31 f 48 w (pad) = + 15 P 32 g 49 x + 16 Q 33 h 50 y + + Special processing is performed if fewer than 24 bits are available + at the end of the data being encoded. A full encoding quantum is + always completed at the end of a quantity. When fewer than 24 input + bits are available in an input group, zero bits are added (on the + right) to form an integral number of 6-bit groups. Padding at the + end of the data is performed using the '=' character. + + Since all base64 input is an integral number of octets, only the + ------------------------------------------------- + following cases can arise: + + (1) the final quantum of encoding input is an integral + multiple of 24 bits; here, the final unit of encoded + output will be an integral multiple of 4 characters + with no "=" padding, + (2) the final quantum of encoding input is exactly 8 bits; + here, the final unit of encoded output will be two + characters followed by two "=" padding characters, or + (3) the final quantum of encoding input is exactly 16 bits; + here, the final unit of encoded output will be three + characters followed by one "=" padding character. + */ + +#include +#include +#include + +#include + +int +hwloc_encode_to_base64(const char *src, size_t srclength, char *target, size_t targsize) +{ + size_t datalength = 0; + unsigned char input[3]; + unsigned char output[4]; + unsigned int i; + + while (2 < srclength) { + input[0] = *src++; + input[1] = *src++; + input[2] = *src++; + srclength -= 3; + + output[0] = input[0] >> 2; + output[1] = ((input[0] & 0x03) << 4) + (input[1] >> 4); + output[2] = ((input[1] & 0x0f) << 2) + (input[2] >> 6); + output[3] = input[2] & 0x3f; + + if (datalength + 4 > targsize) + return (-1); + target[datalength++] = Base64[output[0]]; + target[datalength++] = Base64[output[1]]; + target[datalength++] = Base64[output[2]]; + target[datalength++] = Base64[output[3]]; + } + + /* Now we worry about padding. */ + if (0 != srclength) { + /* Get what's left. */ + input[0] = input[1] = input[2] = '\0'; + for (i = 0; i < srclength; i++) + input[i] = *src++; + + output[0] = input[0] >> 2; + output[1] = ((input[0] & 0x03) << 4) + (input[1] >> 4); + output[2] = ((input[1] & 0x0f) << 2) + (input[2] >> 6); + + if (datalength + 4 > targsize) + return (-1); + target[datalength++] = Base64[output[0]]; + target[datalength++] = Base64[output[1]]; + if (srclength == 1) + target[datalength++] = Pad64; + else + target[datalength++] = Base64[output[2]]; + target[datalength++] = Pad64; + } + if (datalength >= targsize) + return (-1); + target[datalength] = '\0'; /* Returned value doesn't count \0. */ + return (datalength); +} + +/* skips all whitespace anywhere. + converts characters, four at a time, starting at (or after) + src from base - 64 numbers into three 8 bit bytes in the target area. + it returns the number of data bytes stored at the target, or -1 on error. + */ + +int +hwloc_decode_from_base64(char const *src, char *target, size_t targsize) +{ + unsigned int tarindex, state; + int ch; + char *pos; + + state = 0; + tarindex = 0; + + while ((ch = *src++) != '\0') { + if (isspace(ch)) /* Skip whitespace anywhere. */ + continue; + + if (ch == Pad64) + break; + + pos = strchr(Base64, ch); + if (pos == 0) /* A non-base64 character. */ + return (-1); + + switch (state) { + case 0: + if (target) { + if (tarindex >= targsize) + return (-1); + target[tarindex] = (pos - Base64) << 2; + } + state = 1; + break; + case 1: + if (target) { + if (tarindex + 1 >= targsize) + return (-1); + target[tarindex] |= (pos - Base64) >> 4; + target[tarindex+1] = ((pos - Base64) & 0x0f) + << 4 ; + } + tarindex++; + state = 2; + break; + case 2: + if (target) { + if (tarindex + 1 >= targsize) + return (-1); + target[tarindex] |= (pos - Base64) >> 2; + target[tarindex+1] = ((pos - Base64) & 0x03) + << 6; + } + tarindex++; + state = 3; + break; + case 3: + if (target) { + if (tarindex >= targsize) + return (-1); + target[tarindex] |= (pos - Base64); + } + tarindex++; + state = 0; + break; + } + } + + /* + * We are done decoding Base-64 chars. Let's see if we ended + * on a byte boundary, and/or with erroneous trailing characters. + */ + + if (ch == Pad64) { /* We got a pad char. */ + ch = *src++; /* Skip it, get next. */ + switch (state) { + case 0: /* Invalid = in first position */ + case 1: /* Invalid = in second position */ + return (-1); + + case 2: /* Valid, means one byte of info */ + /* Skip any number of spaces. */ + for (; ch != '\0'; ch = *src++) + if (!isspace(ch)) + break; + /* Make sure there is another trailing = sign. */ + if (ch != Pad64) + return (-1); + ch = *src++; /* Skip the = */ + /* Fall through to "single trailing =" case. */ + /* FALLTHROUGH */ + + case 3: /* Valid, means two bytes of info */ + /* + * We know this char is an =. Is there anything but + * whitespace after it? + */ + for (; ch != '\0'; ch = *src++) + if (!isspace(ch)) + return (-1); + + /* + * Now make sure for cases 2 and 3 that the "extra" + * bits that slopped past the last full byte were + * zeros. If we don't check them, they become a + * subliminal channel. + */ + if (target && target[tarindex] != 0) + return (-1); + } + } else { + /* + * We ended by seeing the end of the string. Make sure we + * have no partial bytes lying around. + */ + if (state != 0) + return (-1); + } + + return (tarindex); +} diff --git a/ext/hwloc/src/bind.c b/ext/hwloc/src/bind.c new file mode 100644 index 000000000..37921bcee --- /dev/null +++ b/ext/hwloc/src/bind.c @@ -0,0 +1,781 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2011 inria. All rights reserved. + * Copyright © 2009-2010, 2012 Université Bordeaux 1 + * Copyright © 2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +#include +#include +#include +#include +#ifdef HAVE_SYS_MMAN_H +# include +#endif +/* is only needed if we don't have posix_memalign() */ +#if defined(hwloc_getpagesize) && !defined(HAVE_POSIX_MEMALIGN) && defined(HAVE_MEMALIGN) && defined(HAVE_MALLOC_H) +#include +#endif +#ifdef HAVE_UNISTD_H +#include +#endif +#include +#include + +/* TODO: HWLOC_GNU_SYS, HWLOC_IRIX_SYS, + * + * IRIX: see MP_MUSTRUN / _DSM_MUSTRUN, pthread_setrunon_np, /hw, procss_cpulink, numa_create + * + * We could use glibc's sched_setaffinity generically when it is available + * + * Darwin and OpenBSD don't seem to have binding facilities. + */ + +static hwloc_const_bitmap_t +hwloc_fix_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t set) +{ + hwloc_const_bitmap_t topology_set = hwloc_topology_get_topology_cpuset(topology); + hwloc_const_bitmap_t complete_set = hwloc_topology_get_complete_cpuset(topology); + + if (!topology_set) { + /* The topology is composed of several systems, the cpuset is ambiguous. */ + errno = EXDEV; + return NULL; + } + + if (hwloc_bitmap_iszero(set)) { + errno = EINVAL; + return NULL; + } + + if (!hwloc_bitmap_isincluded(set, complete_set)) { + errno = EINVAL; + return NULL; + } + + if (hwloc_bitmap_isincluded(topology_set, set)) + set = complete_set; + + return set; +} + +int +hwloc_set_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t set, int flags) +{ + set = hwloc_fix_cpubind(topology, set); + if (!set) + return -1; + + if (flags & HWLOC_CPUBIND_PROCESS) { + if (topology->binding_hooks.set_thisproc_cpubind) + return topology->binding_hooks.set_thisproc_cpubind(topology, set, flags); + } else if (flags & HWLOC_CPUBIND_THREAD) { + if (topology->binding_hooks.set_thisthread_cpubind) + return topology->binding_hooks.set_thisthread_cpubind(topology, set, flags); + } else { + if (topology->binding_hooks.set_thisproc_cpubind) + return topology->binding_hooks.set_thisproc_cpubind(topology, set, flags); + else if (topology->binding_hooks.set_thisthread_cpubind) + return topology->binding_hooks.set_thisthread_cpubind(topology, set, flags); + } + + errno = ENOSYS; + return -1; +} + +int +hwloc_get_cpubind(hwloc_topology_t topology, hwloc_bitmap_t set, int flags) +{ + if (flags & HWLOC_CPUBIND_PROCESS) { + if (topology->binding_hooks.get_thisproc_cpubind) + return topology->binding_hooks.get_thisproc_cpubind(topology, set, flags); + } else if (flags & HWLOC_CPUBIND_THREAD) { + if (topology->binding_hooks.get_thisthread_cpubind) + return topology->binding_hooks.get_thisthread_cpubind(topology, set, flags); + } else { + if (topology->binding_hooks.get_thisproc_cpubind) + return topology->binding_hooks.get_thisproc_cpubind(topology, set, flags); + else if (topology->binding_hooks.get_thisthread_cpubind) + return topology->binding_hooks.get_thisthread_cpubind(topology, set, flags); + } + + errno = ENOSYS; + return -1; +} + +int +hwloc_set_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_bitmap_t set, int flags) +{ + set = hwloc_fix_cpubind(topology, set); + if (!set) + return -1; + + if (topology->binding_hooks.set_proc_cpubind) + return topology->binding_hooks.set_proc_cpubind(topology, pid, set, flags); + + errno = ENOSYS; + return -1; +} + +int +hwloc_get_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_bitmap_t set, int flags) +{ + if (topology->binding_hooks.get_proc_cpubind) + return topology->binding_hooks.get_proc_cpubind(topology, pid, set, flags); + + errno = ENOSYS; + return -1; +} + +#ifdef hwloc_thread_t +int +hwloc_set_thread_cpubind(hwloc_topology_t topology, hwloc_thread_t tid, hwloc_const_bitmap_t set, int flags) +{ + set = hwloc_fix_cpubind(topology, set); + if (!set) + return -1; + + if (topology->binding_hooks.set_thread_cpubind) + return topology->binding_hooks.set_thread_cpubind(topology, tid, set, flags); + + errno = ENOSYS; + return -1; +} + +int +hwloc_get_thread_cpubind(hwloc_topology_t topology, hwloc_thread_t tid, hwloc_bitmap_t set, int flags) +{ + if (topology->binding_hooks.get_thread_cpubind) + return topology->binding_hooks.get_thread_cpubind(topology, tid, set, flags); + + errno = ENOSYS; + return -1; +} +#endif + +int +hwloc_get_last_cpu_location(hwloc_topology_t topology, hwloc_bitmap_t set, int flags) +{ + if (flags & HWLOC_CPUBIND_PROCESS) { + if (topology->binding_hooks.get_thisproc_last_cpu_location) + return topology->binding_hooks.get_thisproc_last_cpu_location(topology, set, flags); + } else if (flags & HWLOC_CPUBIND_THREAD) { + if (topology->binding_hooks.get_thisthread_last_cpu_location) + return topology->binding_hooks.get_thisthread_last_cpu_location(topology, set, flags); + } else { + if (topology->binding_hooks.get_thisproc_last_cpu_location) + return topology->binding_hooks.get_thisproc_last_cpu_location(topology, set, flags); + else if (topology->binding_hooks.get_thisthread_last_cpu_location) + return topology->binding_hooks.get_thisthread_last_cpu_location(topology, set, flags); + } + + errno = ENOSYS; + return -1; +} + +int +hwloc_get_proc_last_cpu_location(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_bitmap_t set, int flags) +{ + if (topology->binding_hooks.get_proc_last_cpu_location) + return topology->binding_hooks.get_proc_last_cpu_location(topology, pid, set, flags); + + errno = ENOSYS; + return -1; +} + +static hwloc_const_nodeset_t +hwloc_fix_membind(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset) +{ + hwloc_const_bitmap_t topology_nodeset = hwloc_topology_get_topology_nodeset(topology); + hwloc_const_bitmap_t complete_nodeset = hwloc_topology_get_complete_nodeset(topology); + + if (!hwloc_topology_get_topology_cpuset(topology)) { + /* The topology is composed of several systems, the nodeset is thus + * ambiguous. */ + errno = EXDEV; + return NULL; + } + + if (!complete_nodeset) { + /* There is no NUMA node */ + errno = ENODEV; + return NULL; + } + + if (hwloc_bitmap_iszero(nodeset)) { + errno = EINVAL; + return NULL; + } + + if (!hwloc_bitmap_isincluded(nodeset, complete_nodeset)) { + errno = EINVAL; + return NULL; + } + + if (hwloc_bitmap_isincluded(topology_nodeset, nodeset)) + return complete_nodeset; + + return nodeset; +} + +static int +hwloc_fix_membind_cpuset(hwloc_topology_t topology, hwloc_nodeset_t nodeset, hwloc_const_cpuset_t cpuset) +{ + hwloc_const_bitmap_t topology_set = hwloc_topology_get_topology_cpuset(topology); + hwloc_const_bitmap_t complete_set = hwloc_topology_get_complete_cpuset(topology); + hwloc_const_bitmap_t complete_nodeset = hwloc_topology_get_complete_nodeset(topology); + + if (!topology_set) { + /* The topology is composed of several systems, the cpuset is thus + * ambiguous. */ + errno = EXDEV; + return -1; + } + + if (!complete_nodeset) { + /* There is no NUMA node */ + errno = ENODEV; + return -1; + } + + if (hwloc_bitmap_iszero(cpuset)) { + errno = EINVAL; + return -1; + } + + if (!hwloc_bitmap_isincluded(cpuset, complete_set)) { + errno = EINVAL; + return -1; + } + + if (hwloc_bitmap_isincluded(topology_set, cpuset)) { + hwloc_bitmap_copy(nodeset, complete_nodeset); + return 0; + } + + hwloc_cpuset_to_nodeset(topology, cpuset, nodeset); + return 0; +} + +int +hwloc_set_membind_nodeset(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) +{ + nodeset = hwloc_fix_membind(topology, nodeset); + if (!nodeset) + return -1; + + if (flags & HWLOC_MEMBIND_PROCESS) { + if (topology->binding_hooks.set_thisproc_membind) + return topology->binding_hooks.set_thisproc_membind(topology, nodeset, policy, flags); + } else if (flags & HWLOC_MEMBIND_THREAD) { + if (topology->binding_hooks.set_thisthread_membind) + return topology->binding_hooks.set_thisthread_membind(topology, nodeset, policy, flags); + } else { + if (topology->binding_hooks.set_thisproc_membind) + return topology->binding_hooks.set_thisproc_membind(topology, nodeset, policy, flags); + else if (topology->binding_hooks.set_thisthread_membind) + return topology->binding_hooks.set_thisthread_membind(topology, nodeset, policy, flags); + } + + errno = ENOSYS; + return -1; +} + +int +hwloc_set_membind(hwloc_topology_t topology, hwloc_const_cpuset_t set, hwloc_membind_policy_t policy, int flags) +{ + hwloc_nodeset_t nodeset = hwloc_bitmap_alloc(); + int ret; + + if (hwloc_fix_membind_cpuset(topology, nodeset, set)) + ret = -1; + else + ret = hwloc_set_membind_nodeset(topology, nodeset, policy, flags); + + hwloc_bitmap_free(nodeset); + return ret; +} + +int +hwloc_get_membind_nodeset(hwloc_topology_t topology, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags) +{ + if (flags & HWLOC_MEMBIND_PROCESS) { + if (topology->binding_hooks.get_thisproc_membind) + return topology->binding_hooks.get_thisproc_membind(topology, nodeset, policy, flags); + } else if (flags & HWLOC_MEMBIND_THREAD) { + if (topology->binding_hooks.get_thisthread_membind) + return topology->binding_hooks.get_thisthread_membind(topology, nodeset, policy, flags); + } else { + if (topology->binding_hooks.get_thisproc_membind) + return topology->binding_hooks.get_thisproc_membind(topology, nodeset, policy, flags); + else if (topology->binding_hooks.get_thisthread_membind) + return topology->binding_hooks.get_thisthread_membind(topology, nodeset, policy, flags); + } + + errno = ENOSYS; + return -1; +} + +int +hwloc_get_membind(hwloc_topology_t topology, hwloc_cpuset_t set, hwloc_membind_policy_t * policy, int flags) +{ + hwloc_nodeset_t nodeset; + int ret; + + nodeset = hwloc_bitmap_alloc(); + ret = hwloc_get_membind_nodeset(topology, nodeset, policy, flags); + + if (!ret) + hwloc_cpuset_from_nodeset(topology, set, nodeset); + + hwloc_bitmap_free(nodeset); + return ret; +} + +int +hwloc_set_proc_membind_nodeset(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) +{ + nodeset = hwloc_fix_membind(topology, nodeset); + if (!nodeset) + return -1; + + if (topology->binding_hooks.set_proc_membind) + return topology->binding_hooks.set_proc_membind(topology, pid, nodeset, policy, flags); + + errno = ENOSYS; + return -1; +} + + +int +hwloc_set_proc_membind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_cpuset_t set, hwloc_membind_policy_t policy, int flags) +{ + hwloc_nodeset_t nodeset = hwloc_bitmap_alloc(); + int ret; + + if (hwloc_fix_membind_cpuset(topology, nodeset, set)) + ret = -1; + else + ret = hwloc_set_proc_membind_nodeset(topology, pid, nodeset, policy, flags); + + hwloc_bitmap_free(nodeset); + return ret; +} + +int +hwloc_get_proc_membind_nodeset(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags) +{ + if (topology->binding_hooks.get_proc_membind) + return topology->binding_hooks.get_proc_membind(topology, pid, nodeset, policy, flags); + + errno = ENOSYS; + return -1; +} + +int +hwloc_get_proc_membind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_cpuset_t set, hwloc_membind_policy_t * policy, int flags) +{ + hwloc_nodeset_t nodeset; + int ret; + + nodeset = hwloc_bitmap_alloc(); + ret = hwloc_get_proc_membind_nodeset(topology, pid, nodeset, policy, flags); + + if (!ret) + hwloc_cpuset_from_nodeset(topology, set, nodeset); + + hwloc_bitmap_free(nodeset); + return ret; +} + +int +hwloc_set_area_membind_nodeset(hwloc_topology_t topology, const void *addr, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) +{ + nodeset = hwloc_fix_membind(topology, nodeset); + if (!nodeset) + return -1; + + if (topology->binding_hooks.set_area_membind) + return topology->binding_hooks.set_area_membind(topology, addr, len, nodeset, policy, flags); + + errno = ENOSYS; + return -1; +} + +int +hwloc_set_area_membind(hwloc_topology_t topology, const void *addr, size_t len, hwloc_const_cpuset_t set, hwloc_membind_policy_t policy, int flags) +{ + hwloc_nodeset_t nodeset = hwloc_bitmap_alloc(); + int ret; + + if (hwloc_fix_membind_cpuset(topology, nodeset, set)) + ret = -1; + else + ret = hwloc_set_area_membind_nodeset(topology, addr, len, nodeset, policy, flags); + + hwloc_bitmap_free(nodeset); + return ret; +} + +int +hwloc_get_area_membind_nodeset(hwloc_topology_t topology, const void *addr, size_t len, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags) +{ + if (topology->binding_hooks.get_area_membind) + return topology->binding_hooks.get_area_membind(topology, addr, len, nodeset, policy, flags); + + errno = ENOSYS; + return -1; +} + +int +hwloc_get_area_membind(hwloc_topology_t topology, const void *addr, size_t len, hwloc_cpuset_t set, hwloc_membind_policy_t * policy, int flags) +{ + hwloc_nodeset_t nodeset; + int ret; + + nodeset = hwloc_bitmap_alloc(); + ret = hwloc_get_area_membind_nodeset(topology, addr, len, nodeset, policy, flags); + + if (!ret) + hwloc_cpuset_from_nodeset(topology, set, nodeset); + + hwloc_bitmap_free(nodeset); + return ret; +} + +void * +hwloc_alloc_heap(hwloc_topology_t topology __hwloc_attribute_unused, size_t len) +{ + void *p; +#if defined(hwloc_getpagesize) && defined(HAVE_POSIX_MEMALIGN) + errno = posix_memalign(&p, hwloc_getpagesize(), len); + if (errno) + p = NULL; +#elif defined(hwloc_getpagesize) && defined(HAVE_MEMALIGN) + p = memalign(hwloc_getpagesize(), len); +#else + p = malloc(len); +#endif + return p; +} + +#ifdef MAP_ANONYMOUS +void * +hwloc_alloc_mmap(hwloc_topology_t topology __hwloc_attribute_unused, size_t len) +{ + return mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); +} +#endif + +int +hwloc_free_heap(hwloc_topology_t topology __hwloc_attribute_unused, void *addr, size_t len __hwloc_attribute_unused) +{ + free(addr); + return 0; +} + +#ifdef MAP_ANONYMOUS +int +hwloc_free_mmap(hwloc_topology_t topology __hwloc_attribute_unused, void *addr, size_t len) +{ + if (!addr) + return 0; + return munmap(addr, len); +} +#endif + +void * +hwloc_alloc(hwloc_topology_t topology, size_t len) +{ + if (topology->binding_hooks.alloc) + return topology->binding_hooks.alloc(topology, len); + return hwloc_alloc_heap(topology, len); +} + +void * +hwloc_alloc_membind_nodeset(hwloc_topology_t topology, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) +{ + void *p; + nodeset = hwloc_fix_membind(topology, nodeset); + if (!nodeset) + goto fallback; + if (flags & HWLOC_MEMBIND_MIGRATE) { + errno = EINVAL; + goto fallback; + } + + if (topology->binding_hooks.alloc_membind) + return topology->binding_hooks.alloc_membind(topology, len, nodeset, policy, flags); + else if (topology->binding_hooks.set_area_membind) { + p = hwloc_alloc(topology, len); + if (!p) + return NULL; + if (topology->binding_hooks.set_area_membind(topology, p, len, nodeset, policy, flags) && flags & HWLOC_MEMBIND_STRICT) { + int error = errno; + free(p); + errno = error; + return NULL; + } + return p; + } else { + errno = ENOSYS; + } + +fallback: + if (flags & HWLOC_MEMBIND_STRICT) + /* Report error */ + return NULL; + /* Never mind, allocate anyway */ + return hwloc_alloc(topology, len); +} + +void * +hwloc_alloc_membind(hwloc_topology_t topology, size_t len, hwloc_const_cpuset_t set, hwloc_membind_policy_t policy, int flags) +{ + hwloc_nodeset_t nodeset = hwloc_bitmap_alloc(); + void *ret; + + if (hwloc_fix_membind_cpuset(topology, nodeset, set)) { + if (flags & HWLOC_MEMBIND_STRICT) + ret = NULL; + else + ret = hwloc_alloc(topology, len); + } else + ret = hwloc_alloc_membind_nodeset(topology, len, nodeset, policy, flags); + + hwloc_bitmap_free(nodeset); + return ret; +} + +int +hwloc_free(hwloc_topology_t topology, void *addr, size_t len) +{ + if (topology->binding_hooks.free_membind) + return topology->binding_hooks.free_membind(topology, addr, len); + return hwloc_free_heap(topology, addr, len); +} + +/* + * Empty binding hooks always returning success + */ + +static int dontset_return_complete_cpuset(hwloc_topology_t topology, hwloc_cpuset_t set) +{ + hwloc_const_cpuset_t cpuset = hwloc_topology_get_complete_cpuset(topology); + if (cpuset) { + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); + return 0; + } else + return -1; +} + +static int dontset_thisthread_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_const_bitmap_t set __hwloc_attribute_unused, int flags __hwloc_attribute_unused) +{ + return 0; +} +static int dontget_thisthread_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_bitmap_t set, int flags __hwloc_attribute_unused) +{ + return dontset_return_complete_cpuset(topology, set); +} +static int dontset_thisproc_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_const_bitmap_t set __hwloc_attribute_unused, int flags __hwloc_attribute_unused) +{ + return 0; +} +static int dontget_thisproc_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_bitmap_t set, int flags __hwloc_attribute_unused) +{ + return dontset_return_complete_cpuset(topology, set); +} +static int dontset_proc_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_pid_t pid __hwloc_attribute_unused, hwloc_const_bitmap_t set __hwloc_attribute_unused, int flags __hwloc_attribute_unused) +{ + return 0; +} +static int dontget_proc_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_pid_t pid __hwloc_attribute_unused, hwloc_bitmap_t cpuset, int flags __hwloc_attribute_unused) +{ + return dontset_return_complete_cpuset(topology, cpuset); +} +#ifdef hwloc_thread_t +static int dontset_thread_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_thread_t tid __hwloc_attribute_unused, hwloc_const_bitmap_t set __hwloc_attribute_unused, int flags __hwloc_attribute_unused) +{ + return 0; +} +static int dontget_thread_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_thread_t tid __hwloc_attribute_unused, hwloc_bitmap_t cpuset, int flags __hwloc_attribute_unused) +{ + return dontset_return_complete_cpuset(topology, cpuset); +} +#endif + +static int dontset_return_complete_nodeset(hwloc_topology_t topology, hwloc_nodeset_t set, hwloc_membind_policy_t *policy) +{ + hwloc_const_nodeset_t nodeset = hwloc_topology_get_complete_nodeset(topology); + if (nodeset) { + hwloc_bitmap_copy(set, hwloc_topology_get_complete_nodeset(topology)); + *policy = HWLOC_MEMBIND_DEFAULT; + return 0; + } else + return -1; +} + +static int dontset_thisproc_membind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_const_bitmap_t set __hwloc_attribute_unused, hwloc_membind_policy_t policy __hwloc_attribute_unused, int flags __hwloc_attribute_unused) +{ + return 0; +} +static int dontget_thisproc_membind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_bitmap_t set, hwloc_membind_policy_t * policy, int flags __hwloc_attribute_unused) +{ + return dontset_return_complete_nodeset(topology, set, policy); +} + +static int dontset_thisthread_membind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_const_bitmap_t set __hwloc_attribute_unused, hwloc_membind_policy_t policy __hwloc_attribute_unused, int flags __hwloc_attribute_unused) +{ + return 0; +} +static int dontget_thisthread_membind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_bitmap_t set, hwloc_membind_policy_t * policy, int flags __hwloc_attribute_unused) +{ + return dontset_return_complete_nodeset(topology, set, policy); +} + +static int dontset_proc_membind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_pid_t pid __hwloc_attribute_unused, hwloc_const_bitmap_t set __hwloc_attribute_unused, hwloc_membind_policy_t policy __hwloc_attribute_unused, int flags __hwloc_attribute_unused) +{ + return 0; +} +static int dontget_proc_membind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_pid_t pid __hwloc_attribute_unused, hwloc_bitmap_t set, hwloc_membind_policy_t * policy, int flags __hwloc_attribute_unused) +{ + return dontset_return_complete_nodeset(topology, set, policy); +} + +static int dontset_area_membind(hwloc_topology_t topology __hwloc_attribute_unused, const void *addr __hwloc_attribute_unused, size_t size __hwloc_attribute_unused, hwloc_const_bitmap_t set __hwloc_attribute_unused, hwloc_membind_policy_t policy __hwloc_attribute_unused, int flags __hwloc_attribute_unused) +{ + return 0; +} +static int dontget_area_membind(hwloc_topology_t topology __hwloc_attribute_unused, const void *addr __hwloc_attribute_unused, size_t size __hwloc_attribute_unused, hwloc_bitmap_t set, hwloc_membind_policy_t * policy, int flags __hwloc_attribute_unused) +{ + return dontset_return_complete_nodeset(topology, set, policy); +} + +static void * dontalloc_membind(hwloc_topology_t topology __hwloc_attribute_unused, size_t size __hwloc_attribute_unused, hwloc_const_bitmap_t set __hwloc_attribute_unused, hwloc_membind_policy_t policy __hwloc_attribute_unused, int flags __hwloc_attribute_unused) +{ + return malloc(size); +} +static int dontfree_membind(hwloc_topology_t topology __hwloc_attribute_unused, void *addr __hwloc_attribute_unused, size_t size __hwloc_attribute_unused) +{ + free(addr); + return 0; +} + +static void hwloc_set_dummy_hooks(struct hwloc_binding_hooks *hooks, + struct hwloc_topology_support *support __hwloc_attribute_unused) +{ + hooks->set_thisproc_cpubind = dontset_thisproc_cpubind; + hooks->get_thisproc_cpubind = dontget_thisproc_cpubind; + hooks->set_thisthread_cpubind = dontset_thisthread_cpubind; + hooks->get_thisthread_cpubind = dontget_thisthread_cpubind; + hooks->set_proc_cpubind = dontset_proc_cpubind; + hooks->get_proc_cpubind = dontget_proc_cpubind; +#ifdef hwloc_thread_t + hooks->set_thread_cpubind = dontset_thread_cpubind; + hooks->get_thread_cpubind = dontget_thread_cpubind; +#endif + hooks->get_thisproc_last_cpu_location = dontget_thisproc_cpubind; /* cpubind instead of last_cpu_location is ok */ + hooks->get_thisthread_last_cpu_location = dontget_thisthread_cpubind; /* cpubind instead of last_cpu_location is ok */ + hooks->get_proc_last_cpu_location = dontget_proc_cpubind; /* cpubind instead of last_cpu_location is ok */ + /* TODO: get_thread_last_cpu_location */ + hooks->set_thisproc_membind = dontset_thisproc_membind; + hooks->get_thisproc_membind = dontget_thisproc_membind; + hooks->set_thisthread_membind = dontset_thisthread_membind; + hooks->get_thisthread_membind = dontget_thisthread_membind; + hooks->set_proc_membind = dontset_proc_membind; + hooks->get_proc_membind = dontget_proc_membind; + hooks->set_area_membind = dontset_area_membind; + hooks->get_area_membind = dontget_area_membind; + hooks->alloc_membind = dontalloc_membind; + hooks->free_membind = dontfree_membind; +} + +void +hwloc_set_native_binding_hooks(struct hwloc_binding_hooks *hooks, struct hwloc_topology_support *support) +{ +# ifdef HWLOC_LINUX_SYS + hwloc_set_linuxfs_hooks(hooks, support); +# endif /* HWLOC_LINUX_SYS */ + +# ifdef HWLOC_BGQ_SYS + hwloc_set_bgq_hooks(hooks, support); +# endif /* HWLOC_BGQ_SYS */ + +# ifdef HWLOC_AIX_SYS + hwloc_set_aix_hooks(hooks, support); +# endif /* HWLOC_AIX_SYS */ + +# ifdef HWLOC_OSF_SYS + hwloc_set_osf_hooks(hooks, support); +# endif /* HWLOC_OSF_SYS */ + +# ifdef HWLOC_SOLARIS_SYS + hwloc_set_solaris_hooks(hooks, support); +# endif /* HWLOC_SOLARIS_SYS */ + +# ifdef HWLOC_WIN_SYS + hwloc_set_windows_hooks(hooks, support); +# endif /* HWLOC_WIN_SYS */ + +# ifdef HWLOC_DARWIN_SYS + hwloc_set_darwin_hooks(hooks, support); +# endif /* HWLOC_DARWIN_SYS */ + +# ifdef HWLOC_FREEBSD_SYS + hwloc_set_freebsd_hooks(hooks, support); +# endif /* HWLOC_FREEBSD_SYS */ + +# ifdef HWLOC_NETBSD_SYS + hwloc_set_netbsd_hooks(hooks, support); +# endif /* HWLOC_NETBSD_SYS */ + +# ifdef HWLOC_HPUX_SYS + hwloc_set_hpux_hooks(hooks, support); +# endif /* HWLOC_HPUX_SYS */ +} + +/* If the represented system is actually not this system, use dummy binding hooks. */ +void +hwloc_set_binding_hooks(struct hwloc_topology *topology) +{ + if (topology->is_thissystem) { + hwloc_set_native_binding_hooks(&topology->binding_hooks, &topology->support); + /* every hook not set above will return ENOSYS */ + } else { + /* not this system, use dummy binding hooks that do nothing (but don't return ENOSYS) */ + hwloc_set_dummy_hooks(&topology->binding_hooks, &topology->support); + } + + /* if not is_thissystem, set_cpubind is fake + * and get_cpubind returns the whole system cpuset, + * so don't report that set/get_cpubind as supported + */ + if (topology->is_thissystem) { +#define DO(which,kind) \ + if (topology->binding_hooks.kind) \ + topology->support.which##bind->kind = 1; + DO(cpu,set_thisproc_cpubind); + DO(cpu,get_thisproc_cpubind); + DO(cpu,set_proc_cpubind); + DO(cpu,get_proc_cpubind); + DO(cpu,set_thisthread_cpubind); + DO(cpu,get_thisthread_cpubind); +#ifdef hwloc_thread_t + DO(cpu,set_thread_cpubind); + DO(cpu,get_thread_cpubind); +#endif + DO(cpu,get_thisproc_last_cpu_location); + DO(cpu,get_proc_last_cpu_location); + DO(cpu,get_thisthread_last_cpu_location); + DO(mem,set_thisproc_membind); + DO(mem,get_thisproc_membind); + DO(mem,set_thisthread_membind); + DO(mem,get_thisthread_membind); + DO(mem,set_proc_membind); + DO(mem,get_proc_membind); + DO(mem,set_area_membind); + DO(mem,get_area_membind); + DO(mem,alloc_membind); + } +} diff --git a/ext/hwloc/src/bitmap.c b/ext/hwloc/src/bitmap.c new file mode 100644 index 000000000..39f4dbfe3 --- /dev/null +++ b/ext/hwloc/src/bitmap.c @@ -0,0 +1,1163 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2011 inria. All rights reserved. + * Copyright © 2009-2011 Université Bordeaux 1 + * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +/* TODO + * - have a way to change the initial allocation size + * - preallocate inside the bitmap structure (so that the whole structure is a cacheline for instance) + * and allocate a dedicated array only later when reallocating larger + */ + +/* magic number */ +#define HWLOC_BITMAP_MAGIC 0x20091007 + +/* actual opaque type internals */ +struct hwloc_bitmap_s { + unsigned ulongs_count; /* how many ulong bitmasks are valid, >= 1 */ + unsigned ulongs_allocated; /* how many ulong bitmasks are allocated, >= ulongs_count */ + unsigned long *ulongs; + int infinite; /* set to 1 if all bits beyond ulongs are set */ +#ifdef HWLOC_DEBUG + int magic; +#endif +}; + +/* overzealous check in debug-mode, not as powerful as valgrind but still useful */ +#ifdef HWLOC_DEBUG +#define HWLOC__BITMAP_CHECK(set) do { \ + assert((set)->magic == HWLOC_BITMAP_MAGIC); \ + assert((set)->ulongs_count >= 1); \ + assert((set)->ulongs_allocated >= (set)->ulongs_count); \ +} while (0) +#else +#define HWLOC__BITMAP_CHECK(set) +#endif + +/* extract a subset from a set using an index or a cpu */ +#define HWLOC_SUBBITMAP_INDEX(cpu) ((cpu)/(HWLOC_BITS_PER_LONG)) +#define HWLOC_SUBBITMAP_CPU_ULBIT(cpu) ((cpu)%(HWLOC_BITS_PER_LONG)) +/* Read from a bitmap ulong without knowing whether x is valid. + * Writers should make sure that x is valid and modify set->ulongs[x] directly. + */ +#define HWLOC_SUBBITMAP_READULONG(set,x) ((x) < (set)->ulongs_count ? (set)->ulongs[x] : (set)->infinite ? HWLOC_SUBBITMAP_FULL : HWLOC_SUBBITMAP_ZERO) + +/* predefined subset values */ +#define HWLOC_SUBBITMAP_ZERO 0UL +#define HWLOC_SUBBITMAP_FULL (~0UL) +#define HWLOC_SUBBITMAP_ULBIT(bit) (1UL<<(bit)) +#define HWLOC_SUBBITMAP_CPU(cpu) HWLOC_SUBBITMAP_ULBIT(HWLOC_SUBBITMAP_CPU_ULBIT(cpu)) +#define HWLOC_SUBBITMAP_ULBIT_TO(bit) (HWLOC_SUBBITMAP_FULL>>(HWLOC_BITS_PER_LONG-1-(bit))) +#define HWLOC_SUBBITMAP_ULBIT_FROM(bit) (HWLOC_SUBBITMAP_FULL<<(bit)) +#define HWLOC_SUBBITMAP_ULBIT_FROMTO(begin,end) (HWLOC_SUBBITMAP_ULBIT_TO(end) & HWLOC_SUBBITMAP_ULBIT_FROM(begin)) + +struct hwloc_bitmap_s * hwloc_bitmap_alloc(void) +{ + struct hwloc_bitmap_s * set; + + set = malloc(sizeof(struct hwloc_bitmap_s)); + if (!set) + return NULL; + + set->ulongs_count = 1; + set->ulongs_allocated = 64/sizeof(unsigned long); + set->ulongs = malloc(64); + if (!set->ulongs) { + free(set); + return NULL; + } + + set->ulongs[0] = HWLOC_SUBBITMAP_ZERO; + set->infinite = 0; +#ifdef HWLOC_DEBUG + set->magic = HWLOC_BITMAP_MAGIC; +#endif + return set; +} + +struct hwloc_bitmap_s * hwloc_bitmap_alloc_full(void) +{ + struct hwloc_bitmap_s * set = hwloc_bitmap_alloc(); + if (set) { + set->infinite = 1; + set->ulongs[0] = HWLOC_SUBBITMAP_FULL; + } + return set; +} + +void hwloc_bitmap_free(struct hwloc_bitmap_s * set) +{ + if (!set) + return; + + HWLOC__BITMAP_CHECK(set); +#ifdef HWLOC_DEBUG + set->magic = 0; +#endif + + free(set->ulongs); + free(set); +} + +/* enlarge until it contains at least needed_count ulongs. + */ +static void +hwloc_bitmap_enlarge_by_ulongs(struct hwloc_bitmap_s * set, unsigned needed_count) +{ + unsigned tmp = 1 << hwloc_flsl((unsigned long) needed_count - 1); + if (tmp > set->ulongs_allocated) { + set->ulongs = realloc(set->ulongs, tmp * sizeof(unsigned long)); + assert(set->ulongs); + set->ulongs_allocated = tmp; + } +} + +/* enlarge until it contains at least needed_count ulongs, + * and update new ulongs according to the infinite field. + */ +static void +hwloc_bitmap_realloc_by_ulongs(struct hwloc_bitmap_s * set, unsigned needed_count) +{ + unsigned i; + + HWLOC__BITMAP_CHECK(set); + + if (needed_count <= set->ulongs_count) + return; + + /* realloc larger if needed */ + hwloc_bitmap_enlarge_by_ulongs(set, needed_count); + + /* fill the newly allocated subset depending on the infinite flag */ + for(i=set->ulongs_count; iulongs[i] = set->infinite ? HWLOC_SUBBITMAP_FULL : HWLOC_SUBBITMAP_ZERO; + set->ulongs_count = needed_count; +} + +/* realloc until it contains at least cpu+1 bits */ +#define hwloc_bitmap_realloc_by_cpu_index(set, cpu) hwloc_bitmap_realloc_by_ulongs(set, ((cpu)/HWLOC_BITS_PER_LONG)+1) + +/* reset a bitmap to exactely the needed size. + * the caller must reinitialize all ulongs and the infinite flag later. + */ +static void +hwloc_bitmap_reset_by_ulongs(struct hwloc_bitmap_s * set, unsigned needed_count) +{ + hwloc_bitmap_enlarge_by_ulongs(set, needed_count); + set->ulongs_count = needed_count; +} + +/* reset until it contains exactly cpu+1 bits (roundup to a ulong). + * the caller must reinitialize all ulongs and the infinite flag later. + */ +#define hwloc_bitmap_reset_by_cpu_index(set, cpu) hwloc_bitmap_reset_by_ulongs(set, ((cpu)/HWLOC_BITS_PER_LONG)+1) + +struct hwloc_bitmap_s * hwloc_bitmap_dup(const struct hwloc_bitmap_s * old) +{ + struct hwloc_bitmap_s * new; + + if (!old) + return NULL; + + HWLOC__BITMAP_CHECK(old); + + new = malloc(sizeof(struct hwloc_bitmap_s)); + if (!new) + return NULL; + + new->ulongs = malloc(old->ulongs_allocated * sizeof(unsigned long)); + if (!new->ulongs) { + free(new); + return NULL; + } + new->ulongs_allocated = old->ulongs_allocated; + new->ulongs_count = old->ulongs_count; + memcpy(new->ulongs, old->ulongs, new->ulongs_count * sizeof(unsigned long)); + new->infinite = old->infinite; +#ifdef HWLOC_DEBUG + new->magic = HWLOC_BITMAP_MAGIC; +#endif + return new; +} + +void hwloc_bitmap_copy(struct hwloc_bitmap_s * dst, const struct hwloc_bitmap_s * src) +{ + HWLOC__BITMAP_CHECK(dst); + HWLOC__BITMAP_CHECK(src); + + hwloc_bitmap_reset_by_ulongs(dst, src->ulongs_count); + + memcpy(dst->ulongs, src->ulongs, src->ulongs_count * sizeof(unsigned long)); + dst->infinite = src->infinite; +} + +/* Strings always use 32bit groups */ +#define HWLOC_PRIxSUBBITMAP "%08lx" +#define HWLOC_BITMAP_SUBSTRING_SIZE 32 +#define HWLOC_BITMAP_SUBSTRING_LENGTH (HWLOC_BITMAP_SUBSTRING_SIZE/4) +#define HWLOC_BITMAP_STRING_PER_LONG (HWLOC_BITS_PER_LONG/HWLOC_BITMAP_SUBSTRING_SIZE) + +int hwloc_bitmap_snprintf(char * __hwloc_restrict buf, size_t buflen, const struct hwloc_bitmap_s * __hwloc_restrict set) +{ + ssize_t size = buflen; + char *tmp = buf; + int res, ret = 0; + int needcomma = 0; + int i; + unsigned long accum = 0; + int accumed = 0; +#if HWLOC_BITS_PER_LONG == HWLOC_BITMAP_SUBSTRING_SIZE + const unsigned long accum_mask = ~0UL; +#else /* HWLOC_BITS_PER_LONG != HWLOC_BITMAP_SUBSTRING_SIZE */ + const unsigned long accum_mask = ((1UL << HWLOC_BITMAP_SUBSTRING_SIZE) - 1) << (HWLOC_BITS_PER_LONG - HWLOC_BITMAP_SUBSTRING_SIZE); +#endif /* HWLOC_BITS_PER_LONG != HWLOC_BITMAP_SUBSTRING_SIZE */ + + HWLOC__BITMAP_CHECK(set); + + /* mark the end in case we do nothing later */ + if (buflen > 0) + tmp[0] = '\0'; + + if (set->infinite) { + res = hwloc_snprintf(tmp, size, "0xf...f"); + needcomma = 1; + if (res < 0) + return -1; + ret += res; + if (res >= size) + res = size>0 ? size - 1 : 0; + tmp += res; + size -= res; + /* optimize a common case: full bitmap should appear as 0xf...f instead of 0xf...f,0xffffffff */ + if (set->ulongs_count == 1 && set->ulongs[0] == HWLOC_SUBBITMAP_FULL) + return ret; + } + + i=set->ulongs_count-1; + while (i>=0 || accumed) { + /* Refill accumulator */ + if (!accumed) { + accum = set->ulongs[i--]; + accumed = HWLOC_BITS_PER_LONG; + } + + if (accum & accum_mask) { + /* print the whole subset if not empty */ + res = hwloc_snprintf(tmp, size, needcomma ? ",0x" HWLOC_PRIxSUBBITMAP : "0x" HWLOC_PRIxSUBBITMAP, + (accum & accum_mask) >> (HWLOC_BITS_PER_LONG - HWLOC_BITMAP_SUBSTRING_SIZE)); + needcomma = 1; + } else if (i == -1 && accumed == HWLOC_BITMAP_SUBSTRING_SIZE) { + /* print a single 0 to mark the last subset */ + res = hwloc_snprintf(tmp, size, needcomma ? ",0x0" : "0x0"); + } else if (needcomma) { + res = hwloc_snprintf(tmp, size, ","); + } else { + res = 0; + } + if (res < 0) + return -1; + ret += res; + +#if HWLOC_BITS_PER_LONG == HWLOC_BITMAP_SUBSTRING_SIZE + accum = 0; + accumed = 0; +#else + accum <<= HWLOC_BITMAP_SUBSTRING_SIZE; + accumed -= HWLOC_BITMAP_SUBSTRING_SIZE; +#endif + + if (res >= size) + res = size>0 ? size - 1 : 0; + + tmp += res; + size -= res; + } + + return ret; +} + +int hwloc_bitmap_asprintf(char ** strp, const struct hwloc_bitmap_s * __hwloc_restrict set) +{ + int len; + char *buf; + + HWLOC__BITMAP_CHECK(set); + + len = hwloc_bitmap_snprintf(NULL, 0, set); + buf = malloc(len+1); + *strp = buf; + return hwloc_bitmap_snprintf(buf, len+1, set); +} + +int hwloc_bitmap_sscanf(struct hwloc_bitmap_s *set, const char * __hwloc_restrict string) +{ + const char * current = string; + unsigned long accum = 0; + int count=0; + int infinite = 0; + + /* count how many substrings there are */ + count++; + while ((current = strchr(current+1, ',')) != NULL) + count++; + + current = string; + if (!strncmp("0xf...f", current, 7)) { + current += 7; + if (*current != ',') { + /* special case for infinite/full bitmap */ + hwloc_bitmap_fill(set); + return 0; + } + current++; + infinite = 1; + count--; + } + + hwloc_bitmap_reset_by_ulongs(set, (count + HWLOC_BITMAP_STRING_PER_LONG - 1) / HWLOC_BITMAP_STRING_PER_LONG); + set->infinite = 0; + + while (*current != '\0') { + unsigned long val; + char *next; + val = strtoul(current, &next, 16); + + assert(count > 0); + count--; + + accum |= (val << ((count * HWLOC_BITMAP_SUBSTRING_SIZE) % HWLOC_BITS_PER_LONG)); + if (!(count % HWLOC_BITMAP_STRING_PER_LONG)) { + set->ulongs[count / HWLOC_BITMAP_STRING_PER_LONG] = accum; + accum = 0; + } + + if (*next != ',') { + if (*next || count > 0) + goto failed; + else + break; + } + current = (const char*) next+1; + } + + set->infinite = infinite; /* set at the end, to avoid spurious realloc with filled new ulongs */ + + return 0; + + failed: + /* failure to parse */ + hwloc_bitmap_zero(set); + return -1; +} + +int hwloc_bitmap_list_snprintf(char * __hwloc_restrict buf, size_t buflen, const struct hwloc_bitmap_s * __hwloc_restrict set) +{ + int prev = -1; + hwloc_bitmap_t reverse; + ssize_t size = buflen; + char *tmp = buf; + int res, ret = 0; + int needcomma = 0; + + HWLOC__BITMAP_CHECK(set); + + reverse = hwloc_bitmap_alloc(); /* FIXME: add hwloc_bitmap_alloc_size() + hwloc_bitmap_init_allocated() to avoid malloc? */ + hwloc_bitmap_not(reverse, set); + + /* mark the end in case we do nothing later */ + if (buflen > 0) + tmp[0] = '\0'; + + while (1) { + int begin, end; + + begin = hwloc_bitmap_next(set, prev); + if (begin == -1) + break; + end = hwloc_bitmap_next(reverse, begin); + + if (end == begin+1) { + res = hwloc_snprintf(tmp, size, needcomma ? ",%d" : "%d", begin); + } else if (end == -1) { + res = hwloc_snprintf(tmp, size, needcomma ? ",%d-" : "%d-", begin); + } else { + res = hwloc_snprintf(tmp, size, needcomma ? ",%d-%d" : "%d-%d", begin, end-1); + } + if (res < 0) { + hwloc_bitmap_free(reverse); + return -1; + } + ret += res; + + if (res >= size) + res = size>0 ? size - 1 : 0; + + tmp += res; + size -= res; + needcomma = 1; + + if (end == -1) + break; + else + prev = end - 1; + } + + hwloc_bitmap_free(reverse); + + return ret; +} + +int hwloc_bitmap_list_asprintf(char ** strp, const struct hwloc_bitmap_s * __hwloc_restrict set) +{ + int len; + char *buf; + + HWLOC__BITMAP_CHECK(set); + + len = hwloc_bitmap_list_snprintf(NULL, 0, set); + buf = malloc(len+1); + *strp = buf; + return hwloc_bitmap_list_snprintf(buf, len+1, set); +} + +int hwloc_bitmap_list_sscanf(struct hwloc_bitmap_s *set, const char * __hwloc_restrict string) +{ + const char * current = string; + char *next; + long begin = -1, val; + + hwloc_bitmap_zero(set); + + while (*current != '\0') { + + /* ignore empty ranges */ + while (*current == ',') + current++; + + val = strtoul(current, &next, 0); + /* make sure we got at least one digit */ + if (next == current) + goto failed; + + if (begin != -1) { + /* finishing a range */ + hwloc_bitmap_set_range(set, begin, val); + begin = -1; + + } else if (*next == '-') { + /* starting a new range */ + if (*(next+1) == '\0') { + /* infinite range */ + hwloc_bitmap_set_range(set, val, -1); + break; + } else { + /* normal range */ + begin = val; + } + + } else if (*next == ',' || *next == '\0') { + /* single digit */ + hwloc_bitmap_set(set, val); + } + + if (*next == '\0') + break; + current = next+1; + } + + return 0; + + failed: + /* failure to parse */ + hwloc_bitmap_zero(set); + return -1; +} + +int hwloc_bitmap_taskset_snprintf(char * __hwloc_restrict buf, size_t buflen, const struct hwloc_bitmap_s * __hwloc_restrict set) +{ + ssize_t size = buflen; + char *tmp = buf; + int res, ret = 0; + int started = 0; + int i; + + HWLOC__BITMAP_CHECK(set); + + /* mark the end in case we do nothing later */ + if (buflen > 0) + tmp[0] = '\0'; + + if (set->infinite) { + res = hwloc_snprintf(tmp, size, "0xf...f"); + started = 1; + if (res < 0) + return -1; + ret += res; + if (res >= size) + res = size>0 ? size - 1 : 0; + tmp += res; + size -= res; + /* optimize a common case: full bitmap should appear as 0xf...f instead of 0xf...fffffffff */ + if (set->ulongs_count == 1 && set->ulongs[0] == HWLOC_SUBBITMAP_FULL) + return ret; + } + + i=set->ulongs_count-1; + while (i>=0) { + unsigned long val = set->ulongs[i--]; + if (started) { + /* print the whole subset */ +#if HWLOC_BITS_PER_LONG == 64 + res = hwloc_snprintf(tmp, size, "%016lx", val); +#else + res = hwloc_snprintf(tmp, size, "%08lx", val); +#endif + } else if (val || i == -1) { + res = hwloc_snprintf(tmp, size, "0x%lx", val); + started = 1; + } else { + res = 0; + } + if (res < 0) + return -1; + ret += res; + if (res >= size) + res = size>0 ? size - 1 : 0; + tmp += res; + size -= res; + } + + return ret; +} + +int hwloc_bitmap_taskset_asprintf(char ** strp, const struct hwloc_bitmap_s * __hwloc_restrict set) +{ + int len; + char *buf; + + HWLOC__BITMAP_CHECK(set); + + len = hwloc_bitmap_taskset_snprintf(NULL, 0, set); + buf = malloc(len+1); + *strp = buf; + return hwloc_bitmap_taskset_snprintf(buf, len+1, set); +} + +int hwloc_bitmap_taskset_sscanf(struct hwloc_bitmap_s *set, const char * __hwloc_restrict string) +{ + const char * current = string; + int chars; + int count; + int infinite = 0; + + current = string; + if (!strncmp("0xf...f", current, 7)) { + /* infinite bitmap */ + infinite = 1; + current += 7; + if (*current == '\0') { + /* special case for infinite/full bitmap */ + hwloc_bitmap_fill(set); + return 0; + } + } else { + /* finite bitmap */ + if (!strncmp("0x", current, 2)) + current += 2; + if (*current == '\0') { + /* special case for empty bitmap */ + hwloc_bitmap_zero(set); + return 0; + } + } + /* we know there are other characters now */ + + chars = strlen(current); + count = (chars * 4 + HWLOC_BITS_PER_LONG - 1) / HWLOC_BITS_PER_LONG; + + hwloc_bitmap_reset_by_ulongs(set, count); + set->infinite = 0; + + while (*current != '\0') { + int tmpchars; + char ustr[17]; + unsigned long val; + char *next; + + tmpchars = chars % (HWLOC_BITS_PER_LONG/4); + if (!tmpchars) + tmpchars = (HWLOC_BITS_PER_LONG/4); + + memcpy(ustr, current, tmpchars); + ustr[tmpchars] = '\0'; + val = strtoul(ustr, &next, 16); + if (*next != '\0') + goto failed; + + set->ulongs[count-1] = val; + + current += tmpchars; + chars -= tmpchars; + count--; + } + + set->infinite = infinite; /* set at the end, to avoid spurious realloc with filled new ulongs */ + + return 0; + + failed: + /* failure to parse */ + hwloc_bitmap_zero(set); + return -1; +} + +static void hwloc_bitmap__zero(struct hwloc_bitmap_s *set) +{ + unsigned i; + for(i=0; iulongs_count; i++) + set->ulongs[i] = HWLOC_SUBBITMAP_ZERO; + set->infinite = 0; +} + +void hwloc_bitmap_zero(struct hwloc_bitmap_s * set) +{ + HWLOC__BITMAP_CHECK(set); + + hwloc_bitmap_reset_by_ulongs(set, 1); + hwloc_bitmap__zero(set); +} + +static void hwloc_bitmap__fill(struct hwloc_bitmap_s * set) +{ + unsigned i; + for(i=0; iulongs_count; i++) + set->ulongs[i] = HWLOC_SUBBITMAP_FULL; + set->infinite = 1; +} + +void hwloc_bitmap_fill(struct hwloc_bitmap_s * set) +{ + HWLOC__BITMAP_CHECK(set); + + hwloc_bitmap_reset_by_ulongs(set, 1); + hwloc_bitmap__fill(set); +} + +void hwloc_bitmap_from_ulong(struct hwloc_bitmap_s *set, unsigned long mask) +{ + HWLOC__BITMAP_CHECK(set); + + hwloc_bitmap_reset_by_ulongs(set, 1); + set->ulongs[0] = mask; /* there's always at least one ulong allocated */ + set->infinite = 0; +} + +void hwloc_bitmap_from_ith_ulong(struct hwloc_bitmap_s *set, unsigned i, unsigned long mask) +{ + unsigned j; + + HWLOC__BITMAP_CHECK(set); + + hwloc_bitmap_reset_by_ulongs(set, i+1); + set->ulongs[i] = mask; + for(j=0; julongs[j] = HWLOC_SUBBITMAP_ZERO; + set->infinite = 0; +} + +unsigned long hwloc_bitmap_to_ulong(const struct hwloc_bitmap_s *set) +{ + HWLOC__BITMAP_CHECK(set); + + return set->ulongs[0]; /* there's always at least one ulong allocated */ +} + +unsigned long hwloc_bitmap_to_ith_ulong(const struct hwloc_bitmap_s *set, unsigned i) +{ + HWLOC__BITMAP_CHECK(set); + + return HWLOC_SUBBITMAP_READULONG(set, i); +} + +void hwloc_bitmap_only(struct hwloc_bitmap_s * set, unsigned cpu) +{ + unsigned index_ = HWLOC_SUBBITMAP_INDEX(cpu); + + HWLOC__BITMAP_CHECK(set); + + hwloc_bitmap_reset_by_cpu_index(set, cpu); + hwloc_bitmap__zero(set); + set->ulongs[index_] |= HWLOC_SUBBITMAP_CPU(cpu); +} + +void hwloc_bitmap_allbut(struct hwloc_bitmap_s * set, unsigned cpu) +{ + unsigned index_ = HWLOC_SUBBITMAP_INDEX(cpu); + + HWLOC__BITMAP_CHECK(set); + + hwloc_bitmap_reset_by_cpu_index(set, cpu); + hwloc_bitmap__fill(set); + set->ulongs[index_] &= ~HWLOC_SUBBITMAP_CPU(cpu); +} + +void hwloc_bitmap_set(struct hwloc_bitmap_s * set, unsigned cpu) +{ + unsigned index_ = HWLOC_SUBBITMAP_INDEX(cpu); + + HWLOC__BITMAP_CHECK(set); + + /* nothing to do if setting inside the infinite part of the bitmap */ + if (set->infinite && cpu >= set->ulongs_count * HWLOC_BITS_PER_LONG) + return; + + hwloc_bitmap_realloc_by_cpu_index(set, cpu); + set->ulongs[index_] |= HWLOC_SUBBITMAP_CPU(cpu); +} + +void hwloc_bitmap_set_range(struct hwloc_bitmap_s * set, unsigned begincpu, int _endcpu) +{ + unsigned i; + unsigned beginset,endset; + unsigned endcpu = (unsigned) _endcpu; + + HWLOC__BITMAP_CHECK(set); + + if (_endcpu == -1) { + set->infinite = 1; + /* keep endcpu == -1 since this unsigned is actually larger than anything else */ + } + + if (set->infinite) { + /* truncate the range according to the infinite part of the bitmap */ + if (endcpu >= set->ulongs_count * HWLOC_BITS_PER_LONG) + endcpu = set->ulongs_count * HWLOC_BITS_PER_LONG - 1; + if (begincpu >= set->ulongs_count * HWLOC_BITS_PER_LONG) + return; + } + if (endcpu < begincpu) + return; + hwloc_bitmap_realloc_by_cpu_index(set, endcpu); + + beginset = HWLOC_SUBBITMAP_INDEX(begincpu); + endset = HWLOC_SUBBITMAP_INDEX(endcpu); + for(i=beginset+1; iulongs[i] = HWLOC_SUBBITMAP_FULL; + if (beginset == endset) { + set->ulongs[beginset] |= HWLOC_SUBBITMAP_ULBIT_FROMTO(HWLOC_SUBBITMAP_CPU_ULBIT(begincpu), HWLOC_SUBBITMAP_CPU_ULBIT(endcpu)); + } else { + set->ulongs[beginset] |= HWLOC_SUBBITMAP_ULBIT_FROM(HWLOC_SUBBITMAP_CPU_ULBIT(begincpu)); + set->ulongs[endset] |= HWLOC_SUBBITMAP_ULBIT_TO(HWLOC_SUBBITMAP_CPU_ULBIT(endcpu)); + } +} + +void hwloc_bitmap_set_ith_ulong(struct hwloc_bitmap_s *set, unsigned i, unsigned long mask) +{ + HWLOC__BITMAP_CHECK(set); + + hwloc_bitmap_realloc_by_ulongs(set, i+1); + set->ulongs[i] = mask; +} + +void hwloc_bitmap_clr(struct hwloc_bitmap_s * set, unsigned cpu) +{ + unsigned index_ = HWLOC_SUBBITMAP_INDEX(cpu); + + HWLOC__BITMAP_CHECK(set); + + /* nothing to do if clearing inside the infinitely-unset part of the bitmap */ + if (!set->infinite && cpu >= set->ulongs_count * HWLOC_BITS_PER_LONG) + return; + + hwloc_bitmap_realloc_by_cpu_index(set, cpu); + set->ulongs[index_] &= ~HWLOC_SUBBITMAP_CPU(cpu); +} + +void hwloc_bitmap_clr_range(struct hwloc_bitmap_s * set, unsigned begincpu, int _endcpu) +{ + unsigned i; + unsigned beginset,endset; + unsigned endcpu = (unsigned) _endcpu; + + HWLOC__BITMAP_CHECK(set); + + if (_endcpu == -1) { + set->infinite = 0; + /* keep endcpu == -1 since this unsigned is actually larger than anything else */ + } + + if (!set->infinite) { + /* truncate the range according to the infinitely-unset part of the bitmap */ + if (endcpu >= set->ulongs_count * HWLOC_BITS_PER_LONG) + endcpu = set->ulongs_count * HWLOC_BITS_PER_LONG - 1; + if (begincpu >= set->ulongs_count * HWLOC_BITS_PER_LONG) + return; + } + if (endcpu < begincpu) + return; + hwloc_bitmap_realloc_by_cpu_index(set, endcpu); + + beginset = HWLOC_SUBBITMAP_INDEX(begincpu); + endset = HWLOC_SUBBITMAP_INDEX(endcpu); + for(i=beginset+1; iulongs[i] = HWLOC_SUBBITMAP_ZERO; + if (beginset == endset) { + set->ulongs[beginset] &= ~HWLOC_SUBBITMAP_ULBIT_FROMTO(HWLOC_SUBBITMAP_CPU_ULBIT(begincpu), HWLOC_SUBBITMAP_CPU_ULBIT(endcpu)); + } else { + set->ulongs[beginset] &= ~HWLOC_SUBBITMAP_ULBIT_FROM(HWLOC_SUBBITMAP_CPU_ULBIT(begincpu)); + set->ulongs[endset] &= ~HWLOC_SUBBITMAP_ULBIT_TO(HWLOC_SUBBITMAP_CPU_ULBIT(endcpu)); + } +} + +int hwloc_bitmap_isset(const struct hwloc_bitmap_s * set, unsigned cpu) +{ + unsigned index_ = HWLOC_SUBBITMAP_INDEX(cpu); + + HWLOC__BITMAP_CHECK(set); + + return (HWLOC_SUBBITMAP_READULONG(set, index_) & HWLOC_SUBBITMAP_CPU(cpu)) != 0; +} + +int hwloc_bitmap_iszero(const struct hwloc_bitmap_s *set) +{ + unsigned i; + + HWLOC__BITMAP_CHECK(set); + + if (set->infinite) + return 0; + for(i=0; iulongs_count; i++) + if (set->ulongs[i] != HWLOC_SUBBITMAP_ZERO) + return 0; + return 1; +} + +int hwloc_bitmap_isfull(const struct hwloc_bitmap_s *set) +{ + unsigned i; + + HWLOC__BITMAP_CHECK(set); + + if (!set->infinite) + return 0; + for(i=0; iulongs_count; i++) + if (set->ulongs[i] != HWLOC_SUBBITMAP_FULL) + return 0; + return 1; +} + +int hwloc_bitmap_isequal (const struct hwloc_bitmap_s *set1, const struct hwloc_bitmap_s *set2) +{ + unsigned i; + + HWLOC__BITMAP_CHECK(set1); + HWLOC__BITMAP_CHECK(set2); + + for(i=0; iulongs_count || iulongs_count; i++) + if (HWLOC_SUBBITMAP_READULONG(set1, i) != HWLOC_SUBBITMAP_READULONG(set2, i)) + return 0; + + if (set1->infinite != set2->infinite) + return 0; + + return 1; +} + +int hwloc_bitmap_intersects (const struct hwloc_bitmap_s *set1, const struct hwloc_bitmap_s *set2) +{ + unsigned i; + + HWLOC__BITMAP_CHECK(set1); + HWLOC__BITMAP_CHECK(set2); + + for(i=0; iulongs_count || iulongs_count; i++) + if ((HWLOC_SUBBITMAP_READULONG(set1, i) & HWLOC_SUBBITMAP_READULONG(set2, i)) != HWLOC_SUBBITMAP_ZERO) + return 1; + + if (set1->infinite && set2->infinite) + return 0; + + return 0; +} + +int hwloc_bitmap_isincluded (const struct hwloc_bitmap_s *sub_set, const struct hwloc_bitmap_s *super_set) +{ + unsigned i; + + HWLOC__BITMAP_CHECK(sub_set); + HWLOC__BITMAP_CHECK(super_set); + + for(i=0; iulongs_count; i++) + if (HWLOC_SUBBITMAP_READULONG(super_set, i) != (HWLOC_SUBBITMAP_READULONG(super_set, i) | HWLOC_SUBBITMAP_READULONG(sub_set, i))) + return 0; + + if (sub_set->infinite && !super_set->infinite) + return 0; + + return 1; +} + +void hwloc_bitmap_or (struct hwloc_bitmap_s *res, const struct hwloc_bitmap_s *set1, const struct hwloc_bitmap_s *set2) +{ + const struct hwloc_bitmap_s *largest = set1->ulongs_count > set2->ulongs_count ? set1 : set2; + unsigned i; + + HWLOC__BITMAP_CHECK(res); + HWLOC__BITMAP_CHECK(set1); + HWLOC__BITMAP_CHECK(set2); + + hwloc_bitmap_realloc_by_ulongs(res, largest->ulongs_count); /* cannot reset since the output may also be an input */ + + for(i=0; iulongs_count; i++) + res->ulongs[i] = HWLOC_SUBBITMAP_READULONG(set1, i) | HWLOC_SUBBITMAP_READULONG(set2, i); + + res->infinite = set1->infinite || set2->infinite; +} + +void hwloc_bitmap_and (struct hwloc_bitmap_s *res, const struct hwloc_bitmap_s *set1, const struct hwloc_bitmap_s *set2) +{ + const struct hwloc_bitmap_s *largest = set1->ulongs_count > set2->ulongs_count ? set1 : set2; + unsigned i; + + HWLOC__BITMAP_CHECK(res); + HWLOC__BITMAP_CHECK(set1); + HWLOC__BITMAP_CHECK(set2); + + hwloc_bitmap_realloc_by_ulongs(res, largest->ulongs_count); /* cannot reset since the output may also be an input */ + + for(i=0; iulongs_count; i++) + res->ulongs[i] = HWLOC_SUBBITMAP_READULONG(set1, i) & HWLOC_SUBBITMAP_READULONG(set2, i); + + res->infinite = set1->infinite && set2->infinite; +} + +void hwloc_bitmap_andnot (struct hwloc_bitmap_s *res, const struct hwloc_bitmap_s *set1, const struct hwloc_bitmap_s *set2) +{ + const struct hwloc_bitmap_s *largest = set1->ulongs_count > set2->ulongs_count ? set1 : set2; + unsigned i; + + HWLOC__BITMAP_CHECK(res); + HWLOC__BITMAP_CHECK(set1); + HWLOC__BITMAP_CHECK(set2); + + hwloc_bitmap_realloc_by_ulongs(res, largest->ulongs_count); /* cannot reset since the output may also be an input */ + + for(i=0; iulongs_count; i++) + res->ulongs[i] = HWLOC_SUBBITMAP_READULONG(set1, i) & ~HWLOC_SUBBITMAP_READULONG(set2, i); + + res->infinite = set1->infinite && !set2->infinite; +} + +void hwloc_bitmap_xor (struct hwloc_bitmap_s *res, const struct hwloc_bitmap_s *set1, const struct hwloc_bitmap_s *set2) +{ + const struct hwloc_bitmap_s *largest = set1->ulongs_count > set2->ulongs_count ? set1 : set2; + unsigned i; + + HWLOC__BITMAP_CHECK(res); + HWLOC__BITMAP_CHECK(set1); + HWLOC__BITMAP_CHECK(set2); + + hwloc_bitmap_realloc_by_ulongs(res, largest->ulongs_count); /* cannot reset since the output may also be an input */ + + for(i=0; iulongs_count; i++) + res->ulongs[i] = HWLOC_SUBBITMAP_READULONG(set1, i) ^ HWLOC_SUBBITMAP_READULONG(set2, i); + + res->infinite = (!set1->infinite) != (!set2->infinite); +} + +void hwloc_bitmap_not (struct hwloc_bitmap_s *res, const struct hwloc_bitmap_s *set) +{ + unsigned i; + + HWLOC__BITMAP_CHECK(res); + HWLOC__BITMAP_CHECK(set); + + hwloc_bitmap_realloc_by_ulongs(res, set->ulongs_count); /* cannot reset since the output may also be an input */ + + for(i=0; iulongs_count; i++) + res->ulongs[i] = ~HWLOC_SUBBITMAP_READULONG(set, i); + + res->infinite = !set->infinite; +} + +int hwloc_bitmap_first(const struct hwloc_bitmap_s * set) +{ + unsigned i; + + HWLOC__BITMAP_CHECK(set); + + for(i=0; iulongs_count; i++) { + /* subsets are unsigned longs, use ffsl */ + unsigned long w = set->ulongs[i]; + if (w) + return hwloc_ffsl(w) - 1 + HWLOC_BITS_PER_LONG*i; + } + + if (set->infinite) + return set->ulongs_count * HWLOC_BITS_PER_LONG; + + return -1; +} + +int hwloc_bitmap_last(const struct hwloc_bitmap_s * set) +{ + int i; + + HWLOC__BITMAP_CHECK(set); + + if (set->infinite) + return -1; + + for(i=set->ulongs_count-1; i>=0; i--) { + /* subsets are unsigned longs, use flsl */ + unsigned long w = set->ulongs[i]; + if (w) + return hwloc_flsl(w) - 1 + HWLOC_BITS_PER_LONG*i; + } + + return -1; +} + +int hwloc_bitmap_next(const struct hwloc_bitmap_s * set, int prev_cpu) +{ + unsigned i = HWLOC_SUBBITMAP_INDEX(prev_cpu + 1); + + HWLOC__BITMAP_CHECK(set); + + if (i >= set->ulongs_count) { + if (set->infinite) + return prev_cpu + 1; + else + return -1; + } + + for(; iulongs_count; i++) { + /* subsets are unsigned longs, use ffsl */ + unsigned long w = set->ulongs[i]; + + /* if the prev cpu is in the same word as the possible next one, + we need to mask out previous cpus */ + if (prev_cpu >= 0 && HWLOC_SUBBITMAP_INDEX((unsigned) prev_cpu) == i) + w &= ~HWLOC_SUBBITMAP_ULBIT_TO(HWLOC_SUBBITMAP_CPU_ULBIT(prev_cpu)); + + if (w) + return hwloc_ffsl(w) - 1 + HWLOC_BITS_PER_LONG*i; + } + + if (set->infinite) + return set->ulongs_count * HWLOC_BITS_PER_LONG; + + return -1; +} + +void hwloc_bitmap_singlify(struct hwloc_bitmap_s * set) +{ + unsigned i; + int found = 0; + + HWLOC__BITMAP_CHECK(set); + + for(i=0; iulongs_count; i++) { + if (found) { + set->ulongs[i] = HWLOC_SUBBITMAP_ZERO; + continue; + } else { + /* subsets are unsigned longs, use ffsl */ + unsigned long w = set->ulongs[i]; + if (w) { + int _ffs = hwloc_ffsl(w); + set->ulongs[i] = HWLOC_SUBBITMAP_CPU(_ffs-1); + found = 1; + } + } + } + + if (set->infinite) { + if (found) { + set->infinite = 0; + } else { + /* set the first non allocated bit */ + unsigned first = set->ulongs_count * HWLOC_BITS_PER_LONG; + set->infinite = 0; /* do not let realloc fill the newly allocated sets */ + hwloc_bitmap_set(set, first); + } + } +} + +int hwloc_bitmap_compare_first(const struct hwloc_bitmap_s * set1, const struct hwloc_bitmap_s * set2) +{ + unsigned i; + + HWLOC__BITMAP_CHECK(set1); + HWLOC__BITMAP_CHECK(set2); + + for(i=0; iulongs_count || iulongs_count; i++) { + unsigned long w1 = HWLOC_SUBBITMAP_READULONG(set1, i); + unsigned long w2 = HWLOC_SUBBITMAP_READULONG(set2, i); + if (w1 || w2) { + int _ffs1 = hwloc_ffsl(w1); + int _ffs2 = hwloc_ffsl(w2); + /* if both have a bit set, compare for real */ + if (_ffs1 && _ffs2) + return _ffs1-_ffs2; + /* one is empty, and it is considered higher, so reverse-compare them */ + return _ffs2-_ffs1; + } + } + if ((!set1->infinite) != (!set2->infinite)) + return !!set1->infinite - !!set2->infinite; + return 0; +} + +int hwloc_bitmap_compare(const struct hwloc_bitmap_s * set1, const struct hwloc_bitmap_s * set2) +{ + const struct hwloc_bitmap_s *largest = set1->ulongs_count > set2->ulongs_count ? set1 : set2; + int i; + + HWLOC__BITMAP_CHECK(set1); + HWLOC__BITMAP_CHECK(set2); + + if ((!set1->infinite) != (!set2->infinite)) + return !!set1->infinite - !!set2->infinite; + + for(i=largest->ulongs_count-1; i>=0; i--) { + unsigned long val1 = HWLOC_SUBBITMAP_READULONG(set1, (unsigned) i); + unsigned long val2 = HWLOC_SUBBITMAP_READULONG(set2, (unsigned) i); + if (val1 == val2) + continue; + return val1 < val2 ? -1 : 1; + } + + return 0; +} + +int hwloc_bitmap_weight(const struct hwloc_bitmap_s * set) +{ + int weight = 0; + unsigned i; + + HWLOC__BITMAP_CHECK(set); + + if (set->infinite) + return -1; + + for(i=0; iulongs_count; i++) + weight += hwloc_weight_long(set->ulongs[i]); + return weight; +} diff --git a/ext/hwloc/src/components.c b/ext/hwloc/src/components.c new file mode 100644 index 000000000..14112073a --- /dev/null +++ b/ext/hwloc/src/components.c @@ -0,0 +1,746 @@ +/* + * Copyright © 2009-2013 Inria. All rights reserved. + * Copyright © 2012 Université Bordeau 1 + * See COPYING in top-level directory. + */ + +#include +#include +#include +#include + +#define HWLOC_COMPONENT_STOP_NAME "stop" +#define HWLOC_COMPONENT_EXCLUDE_CHAR '-' +#define HWLOC_COMPONENT_SEPS "," + +/* list of all registered discovery components, sorted by priority, higher priority first. + * noos is last because its priority is 0. + * others' priority is 10. + */ +static struct hwloc_disc_component * hwloc_disc_components = NULL; + +static unsigned hwloc_components_users = 0; /* first one initializes, last ones destroys */ + +static int hwloc_components_verbose = 0; +#ifdef HWLOC_HAVE_PLUGINS +static int hwloc_plugins_verbose = 0; +#endif + +#ifdef HWLOC_WIN_SYS +/* Basic mutex on top of InterlockedCompareExchange() on windows, + * Far from perfect, but easy to maintain, and way enough given that this code will never be needed for real. */ +#include +static LONG hwloc_components_mutex = 0; +#define HWLOC_COMPONENTS_LOCK() do { \ + while (InterlockedCompareExchange(&hwloc_components_mutex, 1, 0) != 0) \ + SwitchToThread(); \ +} while (0) +#define HWLOC_COMPONENTS_UNLOCK() do { \ + assert(hwloc_components_mutex == 1); \ + hwloc_components_mutex = 0; \ +} while (0) + +#elif defined HWLOC_HAVE_PTHREAD_MUTEX +/* pthread mutex if available (except on windows) */ +#include +static pthread_mutex_t hwloc_components_mutex = PTHREAD_MUTEX_INITIALIZER; +#define HWLOC_COMPONENTS_LOCK() pthread_mutex_lock(&hwloc_components_mutex) +#define HWLOC_COMPONENTS_UNLOCK() pthread_mutex_unlock(&hwloc_components_mutex) + +#else /* HWLOC_WIN_SYS || HWLOC_HAVE_PTHREAD_MUTEX */ +#error No mutex implementation available +#endif + + +#ifdef HWLOC_HAVE_PLUGINS + +#include + +/* array of pointers to dynamically loaded plugins */ +static struct hwloc__plugin_desc { + char *name; + struct hwloc_component *component; + char *filename; + lt_dlhandle handle; + struct hwloc__plugin_desc *next; +} *hwloc_plugins = NULL; + +static int +hwloc__dlforeach_cb(const char *filename, void *_data __hwloc_attribute_unused) +{ + const char *basename; + lt_dlhandle handle; + char *componentsymbolname = NULL; + struct hwloc_component *component; + struct hwloc__plugin_desc *desc, **prevdesc; + + if (hwloc_plugins_verbose) + fprintf(stderr, "Plugin dlforeach found `%s'\n", filename); + + basename = strrchr(filename, '/'); + if (!basename) + basename = filename; + else + basename++; + + /* dlopen and get the component structure */ + handle = lt_dlopenext(filename); + if (!handle) { + if (hwloc_plugins_verbose) + fprintf(stderr, "Failed to load plugin: %s\n", lt_dlerror()); + goto out; + } + componentsymbolname = malloc(6+strlen(basename)+10+1); + sprintf(componentsymbolname, "%s_component", basename); + component = lt_dlsym(handle, componentsymbolname); + if (!component) { + if (hwloc_plugins_verbose) + fprintf(stderr, "Failed to find component symbol `%s'\n", + componentsymbolname); + goto out_with_handle; + } + if (component->abi != HWLOC_COMPONENT_ABI) { + if (hwloc_plugins_verbose) + fprintf(stderr, "Plugin symbol ABI %u instead of %u\n", + component->abi, HWLOC_COMPONENT_ABI); + goto out_with_handle; + } + if (hwloc_plugins_verbose) + fprintf(stderr, "Plugin contains expected symbol `%s'\n", + componentsymbolname); + free(componentsymbolname); + componentsymbolname = NULL; + + if (HWLOC_COMPONENT_TYPE_DISC == component->type) { + if (strncmp(basename, "hwloc_", 6)) { + if (hwloc_plugins_verbose) + fprintf(stderr, "Plugin name `%s' doesn't match its type DISCOVERY\n", basename); + goto out_with_handle; + } + } else if (HWLOC_COMPONENT_TYPE_XML == component->type) { + if (strncmp(basename, "hwloc_xml_", 10)) { + if (hwloc_plugins_verbose) + fprintf(stderr, "Plugin name `%s' doesn't match its type XML\n", basename); + goto out_with_handle; + } + } else { + if (hwloc_plugins_verbose) + fprintf(stderr, "Plugin name `%s' has invalid type %u\n", + basename, (unsigned) component->type); + goto out_with_handle; + } + + /* allocate a plugin_desc and queue it */ + desc = malloc(sizeof(*desc)); + if (!desc) + goto out_with_handle; + desc->name = strdup(basename); + desc->filename = strdup(filename); + desc->component = component; + desc->handle = handle; + desc->next = NULL; + if (hwloc_plugins_verbose) + fprintf(stderr, "Plugin descriptor `%s' ready\n", basename); + + /* append to the list */ + prevdesc = &hwloc_plugins; + while (*prevdesc) + prevdesc = &((*prevdesc)->next); + *prevdesc = desc; + if (hwloc_plugins_verbose) + fprintf(stderr, "Plugin descriptor `%s' queued\n", basename); + return 0; + + out_with_handle: + lt_dlclose(handle); + free(componentsymbolname); /* NULL if already freed */ + out: + return 0; +} + +static void +hwloc_plugins_exit(void) +{ + struct hwloc__plugin_desc *desc, *next; + + if (hwloc_plugins_verbose) + fprintf(stderr, "Closing all plugins\n"); + + desc = hwloc_plugins; + while (desc) { + next = desc->next; + lt_dlclose(desc->handle); + free(desc->name); + free(desc->filename); + free(desc); + desc = next; + } + hwloc_plugins = NULL; + + lt_dlexit(); +} + +static int +hwloc_plugins_init(void) +{ + char *verboseenv; + char *path = HWLOC_PLUGINS_PATH; + char *env; + int err; + + verboseenv = getenv("HWLOC_PLUGINS_VERBOSE"); + hwloc_plugins_verbose = verboseenv ? atoi(verboseenv) : 0; + + err = lt_dlinit(); + if (err) + goto out; + + env = getenv("HWLOC_PLUGINS_PATH"); + if (env) + path = env; + + hwloc_plugins = NULL; + + if (hwloc_plugins_verbose) + fprintf(stderr, "Starting plugin dlforeach in %s\n", path); + err = lt_dlforeachfile(path, hwloc__dlforeach_cb, NULL); + if (err) + goto out_with_init; + + return 0; + + out_with_init: + hwloc_plugins_exit(); + out: + return -1; +} + +#endif /* HWLOC_HAVE_PLUGINS */ + +static const char * +hwloc_disc_component_type_string(hwloc_disc_component_type_t type) +{ + switch (type) { + case HWLOC_DISC_COMPONENT_TYPE_CPU: return "cpu"; + case HWLOC_DISC_COMPONENT_TYPE_GLOBAL: return "global"; + case HWLOC_DISC_COMPONENT_TYPE_MISC: return "misc"; + default: return "**unknown**"; + } +} + +static int +hwloc_disc_component_register(struct hwloc_disc_component *component, + const char *filename) +{ + struct hwloc_disc_component **prev; + + /* check that the component name is valid */ + if (!strcmp(component->name, HWLOC_COMPONENT_STOP_NAME)) { + if (hwloc_components_verbose) + fprintf(stderr, "Cannot register discovery component with reserved name `" HWLOC_COMPONENT_STOP_NAME "'\n"); + return -1; + } + if (strchr(component->name, HWLOC_COMPONENT_EXCLUDE_CHAR) + || strcspn(component->name, HWLOC_COMPONENT_SEPS) != strlen(component->name)) { + if (hwloc_components_verbose) + fprintf(stderr, "Cannot register discovery component with name `%s' containing reserved characters `%c" HWLOC_COMPONENT_SEPS "'\n", + component->name, HWLOC_COMPONENT_EXCLUDE_CHAR); + return -1; + } + /* check that the component type is valid */ + switch ((unsigned) component->type) { + case HWLOC_DISC_COMPONENT_TYPE_CPU: + case HWLOC_DISC_COMPONENT_TYPE_GLOBAL: + case HWLOC_DISC_COMPONENT_TYPE_MISC: + break; + default: + fprintf(stderr, "Cannot register discovery component `%s' with unknown type %u\n", + component->name, (unsigned) component->type); + return -1; + } + + prev = &hwloc_disc_components; + while (NULL != *prev) { + if (!strcmp((*prev)->name, component->name)) { + /* if two components have the same name, only keep the highest priority one */ + if ((*prev)->priority < component->priority) { + /* drop the existing component */ + if (hwloc_components_verbose) + fprintf(stderr, "Dropping previously registered discovery component `%s', priority %u lower than new one %u\n", + (*prev)->name, (*prev)->priority, component->priority); + *prev = (*prev)->next; + } else { + /* drop the new one */ + if (hwloc_components_verbose) + fprintf(stderr, "Ignoring new discovery component `%s', priority %u lower than previously registered one %u\n", + component->name, component->priority, (*prev)->priority); + return -1; + } + } + prev = &((*prev)->next); + } + if (hwloc_components_verbose) + fprintf(stderr, "Registered %s discovery component `%s' with priority %u (%s%s)\n", + hwloc_disc_component_type_string(component->type), component->name, component->priority, + filename ? "from plugin " : "statically build", filename ? filename : ""); + + prev = &hwloc_disc_components; + while (NULL != *prev) { + if ((*prev)->priority < component->priority) + break; + prev = &((*prev)->next); + } + component->next = *prev; + *prev = component; + return 0; +} + +#include + +void +hwloc_components_init(struct hwloc_topology *topology __hwloc_attribute_unused) +{ +#ifdef HWLOC_HAVE_PLUGINS + struct hwloc__plugin_desc *desc; +#endif + char *verboseenv; + unsigned i; + + HWLOC_COMPONENTS_LOCK(); + assert((unsigned) -1 != hwloc_components_users); + if (0 != hwloc_components_users++) { + HWLOC_COMPONENTS_UNLOCK(); + goto ok; + } + + verboseenv = getenv("HWLOC_COMPONENTS_VERBOSE"); + hwloc_components_verbose = verboseenv ? atoi(verboseenv) : 0; + +#ifdef HWLOC_HAVE_PLUGINS + hwloc_plugins_init(); +#endif + + /* hwloc_static_components is created by configure in static-components.h */ + for(i=0; NULL != hwloc_static_components[i]; i++) { + if (hwloc_static_components[i]->flags) { + fprintf(stderr, "Ignoring static component with invalid flags %lx\n", + hwloc_static_components[i]->flags); + continue; + } + if (HWLOC_COMPONENT_TYPE_DISC == hwloc_static_components[i]->type) + hwloc_disc_component_register(hwloc_static_components[i]->data, NULL); +/* else if (HWLOC_COMPONENT_TYPE_XML == hwloc_static_components[i]->type) + hwloc_xml_callbacks_register(hwloc_static_components[i]->data);*/ + else + assert(0); + } + + /* dynamic plugins */ +#ifdef HWLOC_HAVE_PLUGINS + for(desc = hwloc_plugins; NULL != desc; desc = desc->next) { + if (desc->component->flags) { + fprintf(stderr, "Ignoring plugin `%s' component with invalid flags %lx\n", + desc->name, desc->component->flags); + continue; + } + if (HWLOC_COMPONENT_TYPE_DISC == desc->component->type) + hwloc_disc_component_register(desc->component->data, desc->filename); +/* else if (HWLOC_COMPONENT_TYPE_XML == desc->component->type) + hwloc_xml_callbacks_register(desc->component->data);*/ + else + assert(0); + } +#endif + + HWLOC_COMPONENTS_UNLOCK(); + + ok: + topology->backends = NULL; +} + +static struct hwloc_disc_component * +hwloc_disc_component_find(int type /* hwloc_disc_component_type_t or -1 if any */, + const char *name /* name of NULL if any */) +{ + struct hwloc_disc_component *comp = hwloc_disc_components; + while (NULL != comp) { + if ((-1 == type || type == (int) comp->type) + && (NULL == name || !strcmp(name, comp->name))) + return comp; + comp = comp->next; + } + return NULL; +} + +/* used by set_xml(), set_synthetic(), ... environment variables, ... to force the first backend */ +int +hwloc_disc_component_force_enable(struct hwloc_topology *topology, + int envvar_forced, + int type, const char *name, + const void *data1, const void *data2, const void *data3) +{ + struct hwloc_disc_component *comp; + struct hwloc_backend *backend; + + if (topology->is_loaded) { + errno = EBUSY; + return -1; + } + + comp = hwloc_disc_component_find(type, name); + if (!comp) { + errno = ENOSYS; + return -1; + } + + backend = comp->instantiate(comp, data1, data2, data3); + if (backend) { + backend->envvar_forced = envvar_forced; + if (topology->backends) + hwloc_backends_disable_all(topology); + return hwloc_backend_enable(topology, backend); + } else + return -1; +} + +static int +hwloc_disc_component_try_enable(struct hwloc_topology *topology, + struct hwloc_disc_component *comp, + const char *comparg, + unsigned *excludes, + int envvar_forced, + int verbose_errors) +{ + struct hwloc_backend *backend; + int err; + + if ((*excludes) & comp->type) { + if (hwloc_components_verbose || verbose_errors) + fprintf(stderr, "Excluding %s discovery component `%s', conflicts with excludes 0x%x\n", + hwloc_disc_component_type_string(comp->type), comp->name, *excludes); + return -1; + } + + backend = comp->instantiate(comp, comparg, NULL, NULL); + if (!backend) { + if (hwloc_components_verbose || verbose_errors) + fprintf(stderr, "Failed to instantiate discovery component `%s'\n", comp->name); + return -1; + } + + backend->envvar_forced = envvar_forced; + err = hwloc_backend_enable(topology, backend); + if (err < 0) + return -1; + + *excludes |= comp->excludes; + + return 0; +} + +void +hwloc_disc_components_enable_others(struct hwloc_topology *topology) +{ + struct hwloc_disc_component *comp; + struct hwloc_backend *backend; + unsigned excludes = 0; + int tryall = 1; + char *env; + + env = getenv("HWLOC_COMPONENTS"); + + /* compute current excludes */ + backend = topology->backends; + while (backend) { + excludes |= backend->component->excludes; + backend = backend->next; + } + + /* enable explicitly listed components */ + if (env) { + char *curenv = env; + size_t s; + + while (*curenv) { + s = strcspn(curenv, HWLOC_COMPONENT_SEPS); + if (s) { + char *arg; + char c; + + /* replace libpci with pci for backward compatibility with v1.6 */ + if (!strncmp(curenv, "libpci", s)) { + curenv[0] = curenv[1] = curenv[2] = *HWLOC_COMPONENT_SEPS; + curenv += 3; + s -= 3; + } else if (curenv[0] == HWLOC_COMPONENT_EXCLUDE_CHAR && !strncmp(curenv+1, "libpci", s-1)) { + curenv[3] = curenv[0]; + curenv[0] = curenv[1] = curenv[2] = *HWLOC_COMPONENT_SEPS; + curenv += 3; + s -= 3; + /* skip this name, it's a negated one */ + goto nextname; + } + + if (curenv[0] == HWLOC_COMPONENT_EXCLUDE_CHAR) + goto nextname; + + if (!strncmp(curenv, HWLOC_COMPONENT_STOP_NAME, s)) { + tryall = 0; + break; + } + + /* save the last char and replace with \0 */ + c = curenv[s]; + curenv[s] = '\0'; + + arg = strchr(curenv, '='); + if (arg) { + *arg = '\0'; + arg++; + } + + comp = hwloc_disc_component_find(-1, curenv); + if (comp) { + hwloc_disc_component_try_enable(topology, comp, arg, &excludes, 1 /* envvar forced */, 1 /* envvar forced need warnings */); + } else { + fprintf(stderr, "Cannot find discovery component `%s'\n", curenv); + } + + /* restore last char (the second loop below needs env to be unmodified) */ + curenv[s] = c; + } + +nextname: + curenv += s; + if (*curenv) + /* Skip comma */ + curenv++; + } + } + + /* env is still the same, the above loop didn't modify it */ + + /* now enable remaining components (except the explicitly '-'-listed ones) */ + if (tryall) { + comp = hwloc_disc_components; + while (NULL != comp) { + /* check if this component was explicitly excluded in env */ + if (env) { + char *curenv = env; + while (*curenv) { + size_t s = strcspn(curenv, HWLOC_COMPONENT_SEPS); + if (curenv[0] == HWLOC_COMPONENT_EXCLUDE_CHAR && !strncmp(curenv+1, comp->name, s-1)) { + if (hwloc_components_verbose) + fprintf(stderr, "Excluding %s discovery component `%s' because of HWLOC_COMPONENTS environment variable\n", + hwloc_disc_component_type_string(comp->type), comp->name); + goto nextcomp; + } + curenv += s; + if (*curenv) + /* Skip comma */ + curenv++; + } + } + hwloc_disc_component_try_enable(topology, comp, NULL, &excludes, 0 /* defaults, not envvar forced */, 0 /* defaults don't need warnings on conflicts */); +nextcomp: + comp = comp->next; + } + } + + if (hwloc_components_verbose) { + /* print a summary */ + int first = 1; + backend = topology->backends; + fprintf(stderr, "Final list of enabled discovery components: "); + while (backend != NULL) { + fprintf(stderr, "%s%s", first ? "" : ",", backend->component->name); + backend = backend->next; + first = 0; + } + fprintf(stderr, "\n"); + } +} + +void +hwloc_components_destroy_all(struct hwloc_topology *topology __hwloc_attribute_unused) +{ + HWLOC_COMPONENTS_LOCK(); + assert(0 != hwloc_components_users); + if (0 != --hwloc_components_users) { + HWLOC_COMPONENTS_UNLOCK(); + return; + } + + /* no need to unlink/free the list of components, they'll be unloaded below */ + + hwloc_disc_components = NULL; +// hwloc_xml_callbacks_reset(); + +#ifdef HWLOC_HAVE_PLUGINS + hwloc_plugins_exit(); +#endif + + HWLOC_COMPONENTS_UNLOCK(); +} + +struct hwloc_backend * +hwloc_backend_alloc(struct hwloc_disc_component *component) +{ + struct hwloc_backend * backend = malloc(sizeof(*backend)); + if (!backend) { + errno = ENOMEM; + return NULL; + } + backend->component = component; + backend->flags = 0; + backend->discover = NULL; + backend->get_obj_cpuset = NULL; + backend->notify_new_object = NULL; + backend->disable = NULL; + backend->is_custom = 0; + backend->is_thissystem = -1; + backend->next = NULL; + backend->envvar_forced = 0; + return backend; +} + +static void +hwloc_backend_disable(struct hwloc_backend *backend) +{ + if (backend->disable) + backend->disable(backend); + free(backend); +} + +int +hwloc_backend_enable(struct hwloc_topology *topology, struct hwloc_backend *backend) +{ + struct hwloc_backend **pprev; + + /* check backend flags */ + if (backend->flags & (~(HWLOC_BACKEND_FLAG_NEED_LEVELS))) { + fprintf(stderr, "Cannot enable %s discovery component `%s' with unknown flags %lx\n", + hwloc_disc_component_type_string(backend->component->type), backend->component->name, backend->flags); + return -1; + } + + /* make sure we didn't already enable this backend, we don't want duplicates */ + pprev = &topology->backends; + while (NULL != *pprev) { + if ((*pprev)->component == backend->component) { + if (hwloc_components_verbose) + fprintf(stderr, "Cannot enable %s discovery component `%s' twice\n", + hwloc_disc_component_type_string(backend->component->type), backend->component->name); + hwloc_backend_disable(backend); + errno = EBUSY; + return -1; + } + pprev = &((*pprev)->next); + } + + if (hwloc_components_verbose) + fprintf(stderr, "Enabling %s discovery component `%s'\n", + hwloc_disc_component_type_string(backend->component->type), backend->component->name); + + /* enqueue at the end */ + pprev = &topology->backends; + while (NULL != *pprev) + pprev = &((*pprev)->next); + backend->next = *pprev; + *pprev = backend; + + backend->topology = topology; + + return 0; +} + +void +hwloc_backends_is_thissystem(struct hwloc_topology *topology) +{ + struct hwloc_backend *backend; + char *local_env; + + /* Apply is_thissystem topology flag before we enforce envvar backends. + * If the application changed the backend with set_foo(), + * it may use set_flags() update the is_thissystem flag here. + * If it changes the backend with environment variables below, + * it may use HWLOC_THISSYSTEM envvar below as well. + */ + + topology->is_thissystem = 1; + + /* apply thissystem from normally-given backends (envvar_forced=0, either set_foo() or defaults) */ + backend = topology->backends; + while (backend != NULL) { + if (backend->envvar_forced == 0 && backend->is_thissystem != -1) { + assert(backend->is_thissystem == 0); + topology->is_thissystem = 0; + } + backend = backend->next; + } + + /* override set_foo() with flags */ + if (topology->flags & HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM) + topology->is_thissystem = 1; + + /* now apply envvar-forced backend (envvar_forced=1) */ + backend = topology->backends; + while (backend != NULL) { + if (backend->envvar_forced == 1 && backend->is_thissystem != -1) { + assert(backend->is_thissystem == 0); + topology->is_thissystem = 0; + } + backend = backend->next; + } + + /* override with envvar-given flag */ + local_env = getenv("HWLOC_THISSYSTEM"); + if (local_env) + topology->is_thissystem = atoi(local_env); +} + +int +hwloc_backends_get_obj_cpuset(struct hwloc_backend *caller, struct hwloc_obj *obj, hwloc_bitmap_t cpuset) +{ + struct hwloc_topology *topology = caller->topology; + struct hwloc_backend *backend = topology->backends; + /* use the first backend's get_obj_cpuset callback */ + while (backend != NULL) { + if (backend->get_obj_cpuset) + return backend->get_obj_cpuset(backend, caller, obj, cpuset); + backend = backend->next; + } + return -1; +} + +int +hwloc_backends_notify_new_object(struct hwloc_backend *caller, struct hwloc_obj *obj) +{ + struct hwloc_backend *backend; + int res = 0; + + backend = caller->topology->backends; + while (NULL != backend) { + if (backend != caller && backend->notify_new_object) + res += backend->notify_new_object(backend, caller, obj); + backend = backend->next; + } + + return res; +} + +void +hwloc_backends_disable_all(struct hwloc_topology *topology) +{ + struct hwloc_backend *backend; + + while (NULL != (backend = topology->backends)) { + struct hwloc_backend *next = backend->next; + if (hwloc_components_verbose) + fprintf(stderr, "Disabling %s discovery component `%s'\n", + hwloc_disc_component_type_string(backend->component->type), backend->component->name); + hwloc_backend_disable(backend); + topology->backends = next; + } + topology->backends = NULL; +} diff --git a/ext/hwloc/src/diff.c b/ext/hwloc/src/diff.c new file mode 100644 index 000000000..e076118e8 --- /dev/null +++ b/ext/hwloc/src/diff.c @@ -0,0 +1,403 @@ +/* + * Copyright © 2013 Inria. All rights reserved. + * See COPYING in top-level directory. + */ + +#include +#include +#include + +int hwloc_topology_diff_destroy(hwloc_topology_t topology __hwloc_attribute_unused, + hwloc_topology_diff_t diff) +{ + hwloc_topology_diff_t next; + while (diff) { + next = diff->generic.next; + switch (diff->generic.type) { + default: + break; + case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR: + switch (diff->obj_attr.diff.generic.type) { + default: + break; + case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_NAME: + case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_INFO: + free(diff->obj_attr.diff.string.name); + free(diff->obj_attr.diff.string.oldvalue); + free(diff->obj_attr.diff.string.newvalue); + break; + } + break; + } + free(diff); + diff = next; + } + return 0; +} + +/************************ + * Computing diffs + */ + +static void hwloc_append_diff(hwloc_topology_diff_t newdiff, + hwloc_topology_diff_t *firstdiffp, + hwloc_topology_diff_t *lastdiffp) +{ + if (*firstdiffp) + (*lastdiffp)->generic.next = newdiff; + else + *firstdiffp = newdiff; + *lastdiffp = newdiff; + newdiff->generic.next = NULL; +} + +static int hwloc_append_diff_too_complex(hwloc_obj_t obj1, + hwloc_topology_diff_t *firstdiffp, + hwloc_topology_diff_t *lastdiffp) +{ + hwloc_topology_diff_t newdiff; + newdiff = malloc(sizeof(*newdiff)); + if (!newdiff) + return -1; + + newdiff->too_complex.type = HWLOC_TOPOLOGY_DIFF_TOO_COMPLEX; + newdiff->too_complex.obj_depth = obj1->depth; + newdiff->too_complex.obj_index = obj1->logical_index; + hwloc_append_diff(newdiff, firstdiffp, lastdiffp); + return 0; +} + +static int hwloc_append_diff_obj_attr_string(hwloc_obj_t obj, + hwloc_topology_diff_obj_attr_type_t type, + const char *name, + const char *oldvalue, + const char *newvalue, + hwloc_topology_diff_t *firstdiffp, + hwloc_topology_diff_t *lastdiffp) +{ + hwloc_topology_diff_t newdiff; + newdiff = malloc(sizeof(*newdiff)); + if (!newdiff) + return -1; + + if (obj->type == HWLOC_OBJ_MISC) + /* TODO: add a custom level/depth for Misc */ + return hwloc_append_diff_too_complex(obj, firstdiffp, lastdiffp); + + newdiff->obj_attr.type = HWLOC_TOPOLOGY_DIFF_OBJ_ATTR; + newdiff->obj_attr.obj_depth = obj->depth; + newdiff->obj_attr.obj_index = obj->logical_index; + newdiff->obj_attr.diff.string.type = type; + newdiff->obj_attr.diff.string.name = name ? strdup(name) : NULL; + newdiff->obj_attr.diff.string.oldvalue = oldvalue ? strdup(oldvalue) : NULL; + newdiff->obj_attr.diff.string.newvalue = newvalue ? strdup(newvalue) : NULL; + hwloc_append_diff(newdiff, firstdiffp, lastdiffp); + return 0; +} + +static int hwloc_append_diff_obj_attr_uint64(hwloc_obj_t obj, + hwloc_topology_diff_obj_attr_type_t type, + hwloc_uint64_t index, + hwloc_uint64_t oldvalue, + hwloc_uint64_t newvalue, + hwloc_topology_diff_t *firstdiffp, + hwloc_topology_diff_t *lastdiffp) +{ + hwloc_topology_diff_t newdiff; + newdiff = malloc(sizeof(*newdiff)); + if (!newdiff) + return -1; + + if (obj->type == HWLOC_OBJ_MISC) + /* TODO: add a custom level/depth for Misc */ + return hwloc_append_diff_too_complex(obj, firstdiffp, lastdiffp); + + newdiff->obj_attr.type = HWLOC_TOPOLOGY_DIFF_OBJ_ATTR; + newdiff->obj_attr.obj_depth = obj->depth; + newdiff->obj_attr.obj_index = obj->logical_index; + newdiff->obj_attr.diff.uint64.type = type; + newdiff->obj_attr.diff.uint64.index = index; + newdiff->obj_attr.diff.uint64.oldvalue = oldvalue; + newdiff->obj_attr.diff.uint64.newvalue = newvalue; + hwloc_append_diff(newdiff, firstdiffp, lastdiffp); + return 0; +} + +static int +hwloc_diff_trees(hwloc_topology_t topo1, hwloc_obj_t obj1, + hwloc_topology_t topo2, hwloc_obj_t obj2, + unsigned flags, + hwloc_topology_diff_t *firstdiffp, hwloc_topology_diff_t *lastdiffp) +{ + unsigned i; + int err; + + if (obj1->depth != obj2->depth) + goto out_too_complex; + if (obj1->type != obj2->type) + goto out_too_complex; + + if (obj1->os_index != obj2->os_index) + goto out_too_complex; + +#define _SETS_DIFFERENT(_set1, _set2) \ + ( ( !(_set1) != !(_set2) ) \ + || ( (_set1) && !hwloc_bitmap_isequal(_set1, _set2) ) ) +#define SETS_DIFFERENT(_set, _obj1, _obj2) _SETS_DIFFERENT((_obj1)->_set, (_obj2)->_set) + if (SETS_DIFFERENT(cpuset, obj1, obj2) + || SETS_DIFFERENT(complete_cpuset, obj1, obj2) + || SETS_DIFFERENT(online_cpuset, obj1, obj2) + || SETS_DIFFERENT(allowed_cpuset, obj1, obj2) + || SETS_DIFFERENT(nodeset, obj1, obj2) + || SETS_DIFFERENT(complete_nodeset, obj1, obj2) + || SETS_DIFFERENT(allowed_nodeset, obj1, obj2)) + goto out_too_complex; + + /* no need to check logical_index, sibling_rank, symmetric_subtree */ + + if ((!obj1->name) != (!obj2->name) + || (obj1->name && strcmp(obj1->name, obj2->name))) { + err = hwloc_append_diff_obj_attr_string(obj1, + HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_NAME, + NULL, + obj1->name, + obj2->name, + firstdiffp, lastdiffp); + if (err < 0) + return err; + } + + /* memory */ + if (obj1->memory.local_memory != obj2->memory.local_memory) { + err = hwloc_append_diff_obj_attr_uint64(obj1, + HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_SIZE, + 0, + obj1->memory.local_memory, + obj2->memory.local_memory, + firstdiffp, lastdiffp); + if (err < 0) + return err; + } + /* ignore memory page_types */ + + /* ignore os_level */ + + /* type-specific attrs */ + switch (obj1->type) { + default: + break; + case HWLOC_OBJ_CACHE: + if (memcmp(obj1->attr, obj2->attr, sizeof(obj1->attr->cache))) + goto out_too_complex; + break; + case HWLOC_OBJ_GROUP: + if (memcmp(obj1->attr, obj2->attr, sizeof(obj1->attr->group))) + goto out_too_complex; + break; + case HWLOC_OBJ_PCI_DEVICE: + if (memcmp(obj1->attr, obj2->attr, sizeof(obj1->attr->pcidev))) + goto out_too_complex; + break; + case HWLOC_OBJ_BRIDGE: + if (memcmp(obj1->attr, obj2->attr, sizeof(obj1->attr->bridge))) + goto out_too_complex; + break; + case HWLOC_OBJ_OS_DEVICE: + if (memcmp(obj1->attr, obj2->attr, sizeof(obj1->attr->osdev))) + goto out_too_complex; + break; + } + + /* distances */ + if (obj1->distances_count != obj2->distances_count) + goto out_too_complex; + for(i=0; idistances_count; i++) { + struct hwloc_distances_s *d1 = obj1->distances[i], *d2 = obj2->distances[i]; + if (d1->relative_depth != d2->relative_depth + || d1->nbobjs != d2->nbobjs + || d1->latency_max != d2->latency_max + || d1->latency_base != d2->latency_base + || memcmp(d1->latency, d2->latency, d1->nbobjs * d1->nbobjs * sizeof(*d1->latency))) + goto out_too_complex; + } + + /* infos */ + if (obj1->infos_count != obj2->infos_count) + goto out_too_complex; + for(i=0; iinfos_count; i++) { + if (strcmp(obj1->infos[i].name, obj2->infos[i].name)) + goto out_too_complex; + if (strcmp(obj1->infos[i].value, obj2->infos[i].value)) { + err = hwloc_append_diff_obj_attr_string(obj1, + HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_INFO, + obj1->infos[i].name, + obj1->infos[i].value, + obj2->infos[i].value, + firstdiffp, lastdiffp); + if (err < 0) + return err; + } + } + + /* ignore userdata */ + + /* children */ + if (obj1->arity != obj2->arity) + goto out_too_complex; + for(i=0; iarity; i++) { + err = hwloc_diff_trees(topo1, obj1->children[i], + topo2, obj2->children[i], + flags, + firstdiffp, lastdiffp); + if (err < 0) + return err; + } + + return 0; + +out_too_complex: + hwloc_append_diff_too_complex(obj1, firstdiffp, lastdiffp); + return 0; +} + +int hwloc_topology_diff_build(hwloc_topology_t topo1, + hwloc_topology_t topo2, + unsigned long flags, + hwloc_topology_diff_t *diffp) +{ + hwloc_topology_diff_t lastdiff, tmpdiff; + int err; + + if (flags != 0) { + errno = EINVAL; + return -1; + } + + *diffp = NULL; + err = hwloc_diff_trees(topo1, hwloc_get_root_obj(topo1), + topo2, hwloc_get_root_obj(topo2), + flags, + diffp, &lastdiff); + + if (!err) { + tmpdiff = *diffp; + while (tmpdiff) { + if (tmpdiff->generic.type == HWLOC_TOPOLOGY_DIFF_TOO_COMPLEX) { + err = 1; + break; + } + tmpdiff = tmpdiff->generic.next; + } + } + + return err; +} + +/******************** + * Applying diffs + */ + +static int +hwloc_apply_diff_one(hwloc_topology_t topology, + hwloc_topology_diff_t diff, + unsigned long flags) +{ + int reverse = !!(flags & HWLOC_TOPOLOGY_DIFF_APPLY_REVERSE); + + switch (diff->generic.type) { + case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR: { + struct hwloc_topology_diff_obj_attr_s *obj_attr = &diff->obj_attr; + hwloc_obj_t obj = hwloc_get_obj_by_depth(topology, obj_attr->obj_depth, obj_attr->obj_index); + if (!obj) + return -1; + + switch (obj_attr->diff.generic.type) { + case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_SIZE: { + hwloc_obj_t tmpobj; + hwloc_uint64_t oldvalue = reverse ? obj_attr->diff.uint64.newvalue : obj_attr->diff.uint64.oldvalue; + hwloc_uint64_t newvalue = reverse ? obj_attr->diff.uint64.oldvalue : obj_attr->diff.uint64.newvalue; + hwloc_uint64_t valuediff = newvalue - oldvalue; + if (obj->memory.local_memory != oldvalue) + return -1; + obj->memory.local_memory = newvalue; + tmpobj = obj; + while (tmpobj) { + tmpobj->memory.total_memory += valuediff; + tmpobj = tmpobj->parent; + } + break; + } + case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_NAME: { + const char *oldvalue = reverse ? obj_attr->diff.string.newvalue : obj_attr->diff.string.oldvalue; + const char *newvalue = reverse ? obj_attr->diff.string.oldvalue : obj_attr->diff.string.newvalue; + if (!obj->name || strcmp(obj->name, oldvalue)) + return -1; + free(obj->name); + obj->name = strdup(newvalue); + break; + } + case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_INFO: { + const char *name = obj_attr->diff.string.name; + const char *oldvalue = reverse ? obj_attr->diff.string.newvalue : obj_attr->diff.string.oldvalue; + const char *newvalue = reverse ? obj_attr->diff.string.oldvalue : obj_attr->diff.string.newvalue; + unsigned i; + int found = 0; + for(i=0; iinfos_count; i++) { + if (!strcmp(obj->infos[i].name, name) + && !strcmp(obj->infos[i].value, oldvalue)) { + free(obj->infos[i].value); + obj->infos[i].value = strdup(newvalue); + found = 1; + break; + } + } + if (!found) + return -1; + break; + } + default: + return -1; + } + + break; + } + default: + return -1; + } + + return 0; +} + +int hwloc_topology_diff_apply(hwloc_topology_t topology, + hwloc_topology_diff_t diff, + unsigned long flags) +{ + hwloc_topology_diff_t tmpdiff, tmpdiff2; + int err, nr; + + if (flags & ~HWLOC_TOPOLOGY_DIFF_APPLY_REVERSE) { + errno = EINVAL; + return -1; + } + + tmpdiff = diff; + nr = 0; + while (tmpdiff) { + nr++; + err = hwloc_apply_diff_one(topology, tmpdiff, flags); + if (err < 0) + goto cancel; + tmpdiff = tmpdiff->generic.next; + } + return 0; + +cancel: + tmpdiff2 = tmpdiff; + tmpdiff = diff; + while (tmpdiff != tmpdiff2) { + hwloc_apply_diff_one(topology, tmpdiff, flags ^ HWLOC_TOPOLOGY_DIFF_APPLY_REVERSE); + tmpdiff = tmpdiff->generic.next; + } + errno = EINVAL; + return -nr; /* return the index (starting at 1) of the first element that couldn't be applied */ +} diff --git a/ext/hwloc/src/distances.c b/ext/hwloc/src/distances.c new file mode 100644 index 000000000..ca3f7eada --- /dev/null +++ b/ext/hwloc/src/distances.c @@ -0,0 +1,1018 @@ +/* + * Copyright © 2010-2013 Inria. All rights reserved. + * Copyright © 2011-2012 Université Bordeaux 1 + * Copyright © 2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +#include +#include +#include +#include + +#include +#include + +/************************** + * Main Init/Clear/Destroy + */ + +/* called during topology init */ +void hwloc_distances_init(struct hwloc_topology *topology) +{ + topology->first_osdist = topology->last_osdist = NULL; +} + +/* called during topology destroy */ +void hwloc_distances_destroy(struct hwloc_topology * topology) +{ + struct hwloc_os_distances_s *osdist, *next = topology->first_osdist; + while ((osdist = next) != NULL) { + next = osdist->next; + /* remove final distance matrics AND physically-ordered ones */ + free(osdist->indexes); + free(osdist->objs); + free(osdist->distances); + free(osdist); + } + topology->first_osdist = topology->last_osdist = NULL; +} + +/****************************************************** + * Inserting distances in the topology + * from a backend, from the environment or by the user + */ + +/* insert a distance matrix in the topology. + * the caller gives us those pointers, we take care of freeing them later and so on. + */ +void hwloc_distances_set(hwloc_topology_t __hwloc_restrict topology, hwloc_obj_type_t type, + unsigned nbobjs, unsigned *indexes, hwloc_obj_t *objs, float *distances, + int force) +{ + struct hwloc_os_distances_s *osdist, *next = topology->first_osdist; + /* look for existing distances for the same type */ + while ((osdist = next) != NULL) { + next = osdist->next; + if (osdist->type == type) { + if (osdist->forced && !force) { + /* there is a forced distance element, ignore the new non-forced one */ + free(indexes); + free(objs); + free(distances); + return; + } else if (force) { + /* we're forcing a new distance, remove the old ones */ + free(osdist->indexes); + free(osdist->objs); + free(osdist->distances); + /* remove current object */ + if (osdist->prev) + osdist->prev->next = next; + else + topology->first_osdist = next; + if (next) + next->prev = osdist->prev; + else + topology->last_osdist = osdist->prev; + /* free current object */ + free(osdist); + } + } + } + + if (!nbobjs) + /* we're just clearing, return now */ + return; + + /* create the new element */ + osdist = malloc(sizeof(struct hwloc_os_distances_s)); + osdist->nbobjs = nbobjs; + osdist->indexes = indexes; + osdist->objs = objs; + osdist->distances = distances; + osdist->forced = force; + osdist->type = type; + /* insert it */ + osdist->next = NULL; + osdist->prev = topology->last_osdist; + if (topology->last_osdist) + topology->last_osdist->next = osdist; + else + topology->first_osdist = osdist; + topology->last_osdist = osdist; +} + +/* make sure a user-given distance matrix is sane */ +static int hwloc_distances__check_matrix(hwloc_topology_t __hwloc_restrict topology __hwloc_attribute_unused, hwloc_obj_type_t type __hwloc_attribute_unused, + unsigned nbobjs, unsigned *indexes, hwloc_obj_t *objs __hwloc_attribute_unused, float *distances __hwloc_attribute_unused) +{ + unsigned i,j; + /* make sure we don't have the same index twice */ + for(i=0; i= 2) { + /* generate the matrix to create x groups of y elements */ + if (x*y*z != nbobjs) { + fprintf(stderr, "Ignoring %s distances from environment variable, invalid grouping (%u*%u*%u=%u instead of %u)\n", + hwloc_obj_type_string(type), x, y, z, x*y*z, nbobjs); + free(indexes); + free(distances); + return; + } + for(i=0; ifirst_osdist; osdist; osdist = osdist->next) { + /* remove the objs array, we'll rebuild it from the indexes + * depending on remaining objects */ + free(osdist->objs); + osdist->objs = NULL; + } +} + + +/* cleanup everything we created from distances so that we may rebuild them + * at the end of restrict() + */ +void hwloc_distances_restrict(struct hwloc_topology *topology, unsigned long flags) +{ + if (flags & HWLOC_RESTRICT_FLAG_ADAPT_DISTANCES) { + /* some objects may have been removed, clear objects arrays so that finalize_os rebuilds them properly */ + hwloc_distances_restrict_os(topology); + } else { + /* if not adapting distances, drop everything */ + hwloc_distances_destroy(topology); + } +} + +/************************************************************** + * Convert user/env given array of indexes into actual objects + */ + +static hwloc_obj_t hwloc_find_obj_by_type_and_os_index(hwloc_obj_t root, hwloc_obj_type_t type, unsigned os_index) +{ + hwloc_obj_t child; + if (root->type == type && root->os_index == os_index) + return root; + child = root->first_child; + while (child) { + hwloc_obj_t found = hwloc_find_obj_by_type_and_os_index(child, type, os_index); + if (found) + return found; + child = child->next_sibling; + } + return NULL; +} + +/* convert distance indexes that were previously stored in the topology + * into actual objects if not done already. + * it's already done when distances come from backends (this function should not be called then). + * it's not done when distances come from the user. + * + * returns -1 if the matrix was invalid + */ +static int +hwloc_distances__finalize_os(struct hwloc_topology *topology, struct hwloc_os_distances_s *osdist) +{ + unsigned nbobjs = osdist->nbobjs; + unsigned *indexes = osdist->indexes; + float *distances = osdist->distances; + unsigned i, j; + hwloc_obj_type_t type = osdist->type; + hwloc_obj_t *objs = calloc(nbobjs, sizeof(hwloc_obj_t)); + + assert(!osdist->objs); + + /* traverse the topology and look for the relevant objects */ + for(i=0; ilevels[0][0], type, indexes[i]); + if (!obj) { + + /* shift the matrix */ +#define OLDPOS(i,j) (distances+(i)*nbobjs+(j)) +#define NEWPOS(i,j) (distances+(i)*(nbobjs-1)+(j)) + if (i>0) { + /** no need to move beginning of 0th line */ + for(j=0; jnbobjs = nbobjs; + if (!nbobjs) { + /* the whole matrix was invalid, let the caller remove this distances */ + free(objs); + return -1; + } + + /* setup the objs array */ + osdist->objs = objs; + return 0; +} + + +void hwloc_distances_finalize_os(struct hwloc_topology *topology) +{ + int dropall = !topology->levels[0][0]->cpuset; /* we don't support distances on multinode systems */ + + struct hwloc_os_distances_s *osdist, *next = topology->first_osdist; + while ((osdist = next) != NULL) { + int err; + next = osdist->next; + + if (dropall) + goto drop; + + /* remove final distance matrics AND physically-ordered ones */ + + if (osdist->objs) + /* nothing to do, switch to the next element */ + continue; + + err = hwloc_distances__finalize_os(topology, osdist); + if (!err) + /* convert ok, switch to the next element */ + continue; + + drop: + /* remove this element */ + free(osdist->indexes); + free(osdist->distances); + /* remove current object */ + if (osdist->prev) + osdist->prev->next = next; + else + topology->first_osdist = next; + if (next) + next->prev = osdist->prev; + else + topology->last_osdist = osdist->prev; + /* free current object */ + free(osdist); + } +} + +/*********************************************************** + * Convert internal distances given by the backend/env/user + * into exported logical distances attached to objects + */ + +static hwloc_obj_t +hwloc_get_obj_covering_cpuset_nodeset(struct hwloc_topology *topology, + hwloc_const_cpuset_t cpuset, + hwloc_const_nodeset_t nodeset) +{ + hwloc_obj_t parent = hwloc_get_root_obj(topology), child; + + assert(cpuset); + assert(nodeset); + assert(hwloc_bitmap_isincluded(cpuset, parent->cpuset)); + assert(!nodeset || hwloc_bitmap_isincluded(nodeset, parent->nodeset)); + + trychildren: + child = parent->first_child; + while (child) { + /* look for a child with a cpuset containing ours. + * if it has a nodeset, it must also contain ours. + */ + if (child->cpuset && hwloc_bitmap_isincluded(cpuset, child->cpuset) + && (!child->nodeset || hwloc_bitmap_isincluded(nodeset, child->nodeset))) { + parent = child; + goto trychildren; + } + child = child->next_sibling; + } + return parent; +} + +static void +hwloc_distances__finalize_logical(struct hwloc_topology *topology, + unsigned nbobjs, + hwloc_obj_t *objs, float *osmatrix) +{ + unsigned i, j, li, lj, minl; + float min = FLT_MAX, max = FLT_MIN; + hwloc_obj_t root; + float *matrix; + hwloc_cpuset_t cpuset; + hwloc_nodeset_t nodeset; + unsigned relative_depth; + int idx; + + /* find the root */ + cpuset = hwloc_bitmap_alloc(); + nodeset = hwloc_bitmap_alloc(); + for(i=0; icpuset); + if (objs[i]->nodeset) + hwloc_bitmap_or(nodeset, nodeset, objs[i]->nodeset); + } + /* find the object covering cpuset AND nodeset (can't use hwloc_get_obj_covering_cpuset()) */ + root = hwloc_get_obj_covering_cpuset_nodeset(topology, cpuset, nodeset); + if (!root) { + /* should not happen, ignore the distance matrix and report an error. */ + if (!hwloc_hide_errors()) { + char *a, *b; + hwloc_bitmap_asprintf(&a, cpuset); + hwloc_bitmap_asprintf(&b, nodeset); + fprintf(stderr, "****************************************************************************\n"); + fprintf(stderr, "* hwloc has encountered an error when adding a distance matrix to the topology.\n"); + fprintf(stderr, "*\n"); + fprintf(stderr, "* hwloc_distances__finalize_logical() could not find any object covering\n"); + fprintf(stderr, "* cpuset %s and nodeset %s\n", a, b); + fprintf(stderr, "*\n"); + fprintf(stderr, "* Please report this error message to the hwloc user's mailing list,\n"); +#ifdef HWLOC_LINUX_SYS + fprintf(stderr, "* along with the output from the hwloc-gather-topology.sh script.\n"); +#else + fprintf(stderr, "* along with any relevant topology information from your platform.\n"); +#endif + fprintf(stderr, "****************************************************************************\n"); + free(a); + free(b); + } + hwloc_bitmap_free(cpuset); + hwloc_bitmap_free(nodeset); + return; + } + /* don't attach to Misc objects */ + while (root->type == HWLOC_OBJ_MISC) + root = root->parent; + /* ideally, root has the exact cpuset and nodeset. + * but ignoring or other things that remove objects may cause the object array to reduce */ + assert(hwloc_bitmap_isincluded(cpuset, root->cpuset)); + assert(hwloc_bitmap_isincluded(nodeset, root->nodeset)); + hwloc_bitmap_free(cpuset); + hwloc_bitmap_free(nodeset); + if (root->depth >= objs[0]->depth) { + /* strange topology led us to find invalid relative depth, ignore */ + return; + } + relative_depth = objs[0]->depth - root->depth; /* this assume that we have distances between objects of the same level */ + + if (nbobjs != hwloc_get_nbobjs_inside_cpuset_by_depth(topology, root->cpuset, root->depth + relative_depth)) + /* the root does not cover the right number of objects, maybe we failed to insert a root (bad intersect or so). */ + return; + + /* get the logical index offset, it's the min of all logical indexes */ + minl = UINT_MAX; + for(i=0; i objs[i]->logical_index) + minl = objs[i]->logical_index; + + /* compute/check min/max values */ + for(i=0; i max) + max = val; + } + if (!min) { + /* Linux up to 2.6.36 reports ACPI SLIT distances, which should be memory latencies. + * Except of SGI IP27 (SGI Origin 200/2000 with MIPS processors) where the distances + * are the number of hops between routers. + */ + hwloc_debug("%s", "minimal distance is 0, matrix does not seem to contain latencies, ignoring\n"); + return; + } + + /* store the normalized latency matrix in the root object */ + idx = root->distances_count++; + root->distances = realloc(root->distances, root->distances_count * sizeof(struct hwloc_distances_s *)); + root->distances[idx] = malloc(sizeof(struct hwloc_distances_s)); + root->distances[idx]->relative_depth = relative_depth; + root->distances[idx]->nbobjs = nbobjs; + root->distances[idx]->latency = matrix = malloc(nbobjs*nbobjs*sizeof(float)); + root->distances[idx]->latency_base = (float) min; +#define NORMALIZE_LATENCY(d) ((d)/(min)) + root->distances[idx]->latency_max = NORMALIZE_LATENCY(max); + for(i=0; ilogical_index - minl; + matrix[li*nbobjs+li] = NORMALIZE_LATENCY(osmatrix[i*nbobjs+i]); + for(j=i+1; jlogical_index - minl; + matrix[li*nbobjs+lj] = NORMALIZE_LATENCY(osmatrix[i*nbobjs+j]); + matrix[lj*nbobjs+li] = NORMALIZE_LATENCY(osmatrix[j*nbobjs+i]); + } + } +} + +/* convert internal distances into logically-ordered distances + * that can be exposed in the API + */ +void +hwloc_distances_finalize_logical(struct hwloc_topology *topology) +{ + unsigned nbobjs; + int depth; + struct hwloc_os_distances_s * osdist; + for(osdist = topology->first_osdist; osdist; osdist = osdist->next) { + + nbobjs = osdist->nbobjs; + if (!nbobjs) + continue; + + depth = hwloc_get_type_depth(topology, osdist->type); + if (depth == HWLOC_TYPE_DEPTH_UNKNOWN || depth == HWLOC_TYPE_DEPTH_MULTIPLE) + continue; + + if (osdist->objs) { + assert(osdist->distances); + hwloc_distances__finalize_logical(topology, nbobjs, + osdist->objs, + osdist->distances); + } + } +} + +/*************************************************** + * Destroying logical distances attached to objects + */ + +/* destroy an object distances structure */ +void +hwloc_clear_object_distances_one(struct hwloc_distances_s * distances) +{ + free(distances->latency); + free(distances); + +} + +void +hwloc_clear_object_distances(hwloc_obj_t obj) +{ + unsigned i; + for (i=0; idistances_count; i++) + hwloc_clear_object_distances_one(obj->distances[i]); + free(obj->distances); + obj->distances = NULL; + obj->distances_count = 0; +} + +/****************************************** + * Grouping objects according to distances + */ + +static void hwloc_report_user_distance_error(const char *msg, int line) +{ + static int reported = 0; + + if (!reported && !hwloc_hide_errors()) { + fprintf(stderr, "****************************************************************************\n"); + fprintf(stderr, "* hwloc has encountered what looks like an error from user-given distances.\n"); + fprintf(stderr, "*\n"); + fprintf(stderr, "* %s\n", msg); + fprintf(stderr, "* Error occurred in topology.c line %d\n", line); + fprintf(stderr, "*\n"); + fprintf(stderr, "* Please make sure that distances given through the interface or environment\n"); + fprintf(stderr, "* variables do not contradict any other topology information.\n"); + fprintf(stderr, "****************************************************************************\n"); + reported = 1; + } +} + +static int hwloc_compare_distances(float a, float b, float accuracy) +{ + if (accuracy != 0.0 && fabsf(a-b) < a * accuracy) + return 0; + return a < b ? -1 : a == b ? 0 : 1; +} + +/* + * Place objects in groups if they are in a transitive graph of minimal distances. + * Return how many groups were created, or 0 if some incomplete distance graphs were found. + */ +static unsigned +hwloc__find_groups_by_min_distance(unsigned nbobjs, + float *_distances, + float accuracy, + unsigned *groupids, + int verbose) +{ + float min_distance = FLT_MAX; + unsigned groupid = 1; + unsigned i,j,k; + unsigned skipped = 0; + +#define DISTANCE(i, j) _distances[(i) * nbobjs + (j)] + + memset(groupids, 0, nbobjs*sizeof(*groupids)); + + /* find the minimal distance */ + for(i=0; itype), accuracies[i]); + if (needcheck && hwloc__check_grouping_matrix(nbobjs, _distances, accuracies[i], verbose) < 0) + continue; + nbgroups = hwloc__find_groups_by_min_distance(nbobjs, _distances, accuracies[i], groupids, verbose); + if (nbgroups) + break; + } + if (!nbgroups) + goto outter_free; + + /* For convenience, put these declarations inside a block. It's a + crying shame we can't use C99 syntax here, and have to do a bunch + of mallocs. :-( */ + { + hwloc_obj_t *groupobjs = NULL; + unsigned *groupsizes = NULL; + float *groupdistances = NULL; + + groupobjs = malloc(sizeof(hwloc_obj_t) * nbgroups); + groupsizes = malloc(sizeof(unsigned) * nbgroups); + groupdistances = malloc(sizeof(float) * nbgroups * nbgroups); + if (NULL == groupobjs || NULL == groupsizes || NULL == groupdistances) { + goto inner_free; + } + /* create new Group objects and record their size */ + memset(&(groupsizes[0]), 0, sizeof(groupsizes[0]) * nbgroups); + for(i=0; icpuset = hwloc_bitmap_alloc(); + group_obj->attr->group.depth = topology->next_group_depth; + for (j=0; jcpuset, group_obj->cpuset, objs[j]->cpuset); + /* if one obj has a nodeset, assemble a group nodeset */ + if (objs[j]->nodeset) { + if (!group_obj->nodeset) + group_obj->nodeset = hwloc_bitmap_alloc(); + hwloc_bitmap_or(group_obj->nodeset, group_obj->nodeset, objs[j]->nodeset); + } + groupsizes[i]++; + } + hwloc_debug_1arg_bitmap("adding Group object with %u objects and cpuset %s\n", + groupsizes[i], group_obj->cpuset); + res_obj = hwloc__insert_object_by_cpuset(topology, group_obj, + fromuser ? hwloc_report_user_distance_error : hwloc_report_os_error); + /* res_obj may be different from group_objs if we got groups from XML import before grouping */ + groupobjs[i] = res_obj; + } + + /* factorize distances */ + memset(&(groupdistances[0]), 0, sizeof(groupdistances[0]) * nbgroups * nbgroups); +#undef DISTANCE +#define DISTANCE(i, j) _distances[(i) * nbobjs + (j)] +#define GROUP_DISTANCE(i, j) groupdistances[(i) * nbgroups + (j)] + for(i=0; inext_group_depth++; + hwloc__groups_by_distances(topology, nbgroups, groupobjs, (float*) groupdistances, nbaccuracies, accuracies, fromuser, 0 /* no need to check generated matrix */, verbose); + + inner_free: + /* Safely free everything */ + if (NULL != groupobjs) { + free(groupobjs); + } + if (NULL != groupsizes) { + free(groupsizes); + } + if (NULL != groupdistances) { + free(groupdistances); + } + } + + outter_free: + if (NULL != groupids) { + free(groupids); + } +} + +void +hwloc_group_by_distances(struct hwloc_topology *topology) +{ + unsigned nbobjs; + struct hwloc_os_distances_s * osdist; + char *env; + float accuracies[5] = { 0.0f, 0.01f, 0.02f, 0.05f, 0.1f }; + unsigned nbaccuracies = 5; + hwloc_obj_t group_obj; + int verbose = 0; + unsigned i; +#ifdef HWLOC_DEBUG + unsigned j; +#endif + + env = getenv("HWLOC_GROUPING"); + if (env && !atoi(env)) + return; + /* backward compat with v1.2 */ + if (getenv("HWLOC_IGNORE_DISTANCES")) + return; + + env = getenv("HWLOC_GROUPING_ACCURACY"); + if (!env) { + /* only use 0.0 */ + nbaccuracies = 1; + } else if (strcmp(env, "try")) { + /* use the given value */ + nbaccuracies = 1; + accuracies[0] = (float) atof(env); + } /* otherwise try all values */ + +#ifdef HWLOC_DEBUG + verbose = 1; +#else + env = getenv("HWLOC_GROUPING_VERBOSE"); + if (env) + verbose = atoi(env); +#endif + + for(osdist = topology->first_osdist; osdist; osdist = osdist->next) { + + nbobjs = osdist->nbobjs; + if (!nbobjs) + continue; + + if (osdist->objs) { + /* if we have objs, we must have distances as well, + * thanks to hwloc_convert_distances_indexes_into_objects() + */ + assert(osdist->distances); + +#ifdef HWLOC_DEBUG + hwloc_debug("%s", "trying to group objects using distance matrix:\n"); + hwloc_debug("%s", " index"); + for(j=0; jobjs[j]->os_index); + hwloc_debug("%s", "\n"); + for(i=0; iobjs[i]->os_index); + for(j=0; jdistances[i*nbobjs + j]); + hwloc_debug("%s", "\n"); + } +#endif + + hwloc__groups_by_distances(topology, nbobjs, + osdist->objs, + osdist->distances, + nbaccuracies, accuracies, + osdist->indexes != NULL, + 1 /* check the first matrice */, + verbose); + + /* add a final group object covering everybody so that the distance matrix can be stored somewhere. + * this group will be merged into a regular object if the matrix isn't strangely incomplete + */ + group_obj = hwloc_alloc_setup_object(HWLOC_OBJ_GROUP, -1); + group_obj->attr->group.depth = (unsigned) -1; + group_obj->cpuset = hwloc_bitmap_alloc(); + for(i=0; icpuset, group_obj->cpuset, osdist->objs[i]->cpuset); + /* if one obj has a nodeset, assemble a group nodeset */ + if (osdist->objs[i]->nodeset) { + if (!group_obj->nodeset) + group_obj->nodeset = hwloc_bitmap_alloc(); + hwloc_bitmap_or(group_obj->nodeset, group_obj->nodeset, osdist->objs[i]->nodeset); + } + } + hwloc_debug_1arg_bitmap("adding Group object (as root of distance matrix with %u objects) with cpuset %s\n", + nbobjs, group_obj->cpuset); + hwloc__insert_object_by_cpuset(topology, group_obj, + osdist->indexes != NULL ? hwloc_report_user_distance_error : hwloc_report_os_error); + } + } +} diff --git a/ext/hwloc/src/dolib.c b/ext/hwloc/src/dolib.c new file mode 100644 index 000000000..cc24fc68b --- /dev/null +++ b/ext/hwloc/src/dolib.c @@ -0,0 +1,47 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009 inria. All rights reserved. + * Copyright © 2009, 2012 Université Bordeaux 1 + * See COPYING in top-level directory. + */ + +/* Wrapper to avoid msys' tendency to turn / into \ and : into ; */ + +#ifdef HAVE_UNISTD_H +#include +#endif +#include +#include + +int main(int argc, char *argv[]) { + char *prog, *arch, *def, *version, *lib; + char s[1024]; + char name[16]; + int current, age, revision; + + if (argc != 6) { + fprintf(stderr,"bad number of arguments"); + exit(EXIT_FAILURE); + } + + prog = argv[1]; + arch = argv[2]; + def = argv[3]; + version = argv[4]; + lib = argv[5]; + + if (sscanf(version, "%d:%d:%d", ¤t, &revision, &age) != 3) + exit(EXIT_FAILURE); + + snprintf(name, sizeof(name), "libhwloc-%d", current - age); + printf("using soname %s\n", name); + + snprintf(s, sizeof(s), "\"%s\" /machine:%s /def:%s /name:%s /out:%s", + prog, arch, def, name, lib); + if (system(s)) { + fprintf(stderr, "%s failed\n", s); + exit(EXIT_FAILURE); + } + + exit(EXIT_SUCCESS); +} diff --git a/ext/hwloc/src/hwloc.dtd b/ext/hwloc/src/hwloc.dtd new file mode 100644 index 000000000..e932c54a8 --- /dev/null +++ b/ext/hwloc/src/hwloc.dtd @@ -0,0 +1,71 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/ext/hwloc/src/misc.c b/ext/hwloc/src/misc.c new file mode 100644 index 000000000..9ef3be379 --- /dev/null +++ b/ext/hwloc/src/misc.c @@ -0,0 +1,106 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2012 Inria. All rights reserved. + * Copyright © 2009-2010 Université Bordeaux 1 + * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +#include +#include +#include + +#include +#ifdef HAVE_SYS_UTSNAME_H +#include +#endif +#include +#include +#include +#include +#include + +int hwloc_snprintf(char *str, size_t size, const char *format, ...) +{ + int ret; + va_list ap; + static char bin; + size_t fakesize; + char *fakestr; + + /* Some systems crash on str == NULL */ + if (!size) { + str = &bin; + size = 1; + } + + va_start(ap, format); + ret = vsnprintf(str, size, format, ap); + va_end(ap); + + if (ret >= 0 && (size_t) ret != size-1) + return ret; + + /* vsnprintf returned size-1 or -1. That could be a system which reports the + * written data and not the actually required room. Try increasing buffer + * size to get the latter. */ + + fakesize = size; + fakestr = NULL; + do { + fakesize *= 2; + free(fakestr); + fakestr = malloc(fakesize); + if (NULL == fakestr) + return -1; + va_start(ap, format); + errno = 0; + ret = vsnprintf(fakestr, fakesize, format, ap); + va_end(ap); + } while ((size_t) ret == fakesize-1 || (ret < 0 && (!errno || errno == ERANGE))); + + if (ret >= 0 && size) { + if (size > (size_t) ret+1) + size = ret+1; + memcpy(str, fakestr, size-1); + str[size-1] = 0; + } + free(fakestr); + + return ret; +} + +int hwloc_namecoloncmp(const char *haystack, const char *needle, size_t n) +{ + size_t i = 0; + while (*haystack && *haystack != ':') { + int ha = *haystack++; + int low_h = tolower(ha); + int ne = *needle++; + int low_n = tolower(ne); + if (low_h != low_n) + return 1; + i++; + } + return i < n; +} + +void hwloc_add_uname_info(struct hwloc_topology *topology __hwloc_attribute_unused) +{ +#ifdef HAVE_UNAME + struct utsname utsname; + + if (uname(&utsname) < 0) + return; + + if (hwloc_obj_get_info_by_name(topology->levels[0][0], "OSName")) + /* don't annotate twice */ + return; + + hwloc_obj_add_info(topology->levels[0][0], "OSName", utsname.sysname); + hwloc_obj_add_info(topology->levels[0][0], "OSRelease", utsname.release); + hwloc_obj_add_info(topology->levels[0][0], "OSVersion", utsname.version); + hwloc_obj_add_info(topology->levels[0][0], "HostName", utsname.nodename); + hwloc_obj_add_info(topology->levels[0][0], "Architecture", utsname.machine); +#endif /* HAVE_UNAME */ +} diff --git a/ext/hwloc/src/pci-common.c b/ext/hwloc/src/pci-common.c new file mode 100644 index 000000000..708584d85 --- /dev/null +++ b/ext/hwloc/src/pci-common.c @@ -0,0 +1,457 @@ +/* + * Copyright © 2009-2013 Inria. All rights reserved. + * See COPYING in top-level directory. + */ + +#include +#include +#include +#include + +static void +hwloc_pci_traverse_print_cb(void * cbdata __hwloc_attribute_unused, + struct hwloc_obj *pcidev, int depth __hwloc_attribute_unused) +{ + char busid[14]; + snprintf(busid, sizeof(busid), "%04x:%02x:%02x.%01x", + pcidev->attr->pcidev.domain, pcidev->attr->pcidev.bus, pcidev->attr->pcidev.dev, pcidev->attr->pcidev.func); + + if (pcidev->type == HWLOC_OBJ_BRIDGE) { + if (pcidev->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_HOST) + hwloc_debug("%*s HostBridge", depth, ""); + else + hwloc_debug("%*s %s Bridge [%04x:%04x]", depth, "", busid, + pcidev->attr->pcidev.vendor_id, pcidev->attr->pcidev.device_id); + hwloc_debug(" to %04x:[%02x:%02x]\n", + pcidev->attr->bridge.downstream.pci.domain, pcidev->attr->bridge.downstream.pci.secondary_bus, pcidev->attr->bridge.downstream.pci.subordinate_bus); + } else + hwloc_debug("%*s %s Device [%04x:%04x (%04x:%04x) rev=%02x class=%04x]\n", depth, "", busid, + pcidev->attr->pcidev.vendor_id, pcidev->attr->pcidev.device_id, + pcidev->attr->pcidev.subvendor_id, pcidev->attr->pcidev.subdevice_id, + pcidev->attr->pcidev.revision, pcidev->attr->pcidev.class_id); +} + +static void +hwloc_pci_traverse_setbridgedepth_cb(void * cbdata __hwloc_attribute_unused, + struct hwloc_obj *pcidev, int depth) +{ + if (pcidev->type == HWLOC_OBJ_BRIDGE) + pcidev->attr->bridge.depth = depth; +} + +static void +hwloc_pci_traverse_lookuposdevices_cb(void * cbdata, + struct hwloc_obj *pcidev, int depth __hwloc_attribute_unused) +{ + struct hwloc_backend *backend = cbdata; + + if (pcidev->type == HWLOC_OBJ_BRIDGE) + return; + + hwloc_backends_notify_new_object(backend, pcidev); +} + +static void +hwloc_pci__traverse(void * cbdata, struct hwloc_obj *root, + void (*cb)(void * cbdata, struct hwloc_obj *, int depth), + int depth) +{ + struct hwloc_obj *child = root->first_child; + while (child) { + cb(cbdata, child, depth); + if (child->type == HWLOC_OBJ_BRIDGE) + hwloc_pci__traverse(cbdata, child, cb, depth+1); + child = child->next_sibling; + } +} + +static void +hwloc_pci_traverse(void * cbdata, struct hwloc_obj *root, + void (*cb)(void * cbdata, struct hwloc_obj *, int depth)) +{ + hwloc_pci__traverse(cbdata, root, cb, 0); +} + +enum hwloc_pci_busid_comparison_e { + HWLOC_PCI_BUSID_LOWER, + HWLOC_PCI_BUSID_HIGHER, + HWLOC_PCI_BUSID_INCLUDED, + HWLOC_PCI_BUSID_SUPERSET +}; + +static enum hwloc_pci_busid_comparison_e +hwloc_pci_compare_busids(struct hwloc_obj *a, struct hwloc_obj *b) +{ + if (a->type == HWLOC_OBJ_BRIDGE) + assert(a->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_PCI); + if (b->type == HWLOC_OBJ_BRIDGE) + assert(b->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_PCI); + + if (a->attr->pcidev.domain < b->attr->pcidev.domain) + return HWLOC_PCI_BUSID_LOWER; + if (a->attr->pcidev.domain > b->attr->pcidev.domain) + return HWLOC_PCI_BUSID_HIGHER; + + if (a->type == HWLOC_OBJ_BRIDGE + && b->attr->pcidev.bus >= a->attr->bridge.downstream.pci.secondary_bus + && b->attr->pcidev.bus <= a->attr->bridge.downstream.pci.subordinate_bus) + return HWLOC_PCI_BUSID_SUPERSET; + if (b->type == HWLOC_OBJ_BRIDGE + && a->attr->pcidev.bus >= b->attr->bridge.downstream.pci.secondary_bus + && a->attr->pcidev.bus <= b->attr->bridge.downstream.pci.subordinate_bus) + return HWLOC_PCI_BUSID_INCLUDED; + + if (a->attr->pcidev.bus < b->attr->pcidev.bus) + return HWLOC_PCI_BUSID_LOWER; + if (a->attr->pcidev.bus > b->attr->pcidev.bus) + return HWLOC_PCI_BUSID_HIGHER; + + if (a->attr->pcidev.dev < b->attr->pcidev.dev) + return HWLOC_PCI_BUSID_LOWER; + if (a->attr->pcidev.dev > b->attr->pcidev.dev) + return HWLOC_PCI_BUSID_HIGHER; + + if (a->attr->pcidev.func < b->attr->pcidev.func) + return HWLOC_PCI_BUSID_LOWER; + if (a->attr->pcidev.func > b->attr->pcidev.func) + return HWLOC_PCI_BUSID_HIGHER; + + /* Should never reach here. Abort on both debug builds and + non-debug builds */ + assert(0); + fprintf(stderr, "Bad assertion in hwloc %s:%d (aborting)\n", __FILE__, __LINE__); + exit(1); +} + +static void +hwloc_pci_add_child_before(struct hwloc_obj *root, struct hwloc_obj *child, struct hwloc_obj *new) +{ + if (child) { + new->prev_sibling = child->prev_sibling; + child->prev_sibling = new; + } else { + new->prev_sibling = root->last_child; + root->last_child = new; + } + + if (new->prev_sibling) + new->prev_sibling->next_sibling = new; + else + root->first_child = new; + new->next_sibling = child; +} + +static void +hwloc_pci_remove_child(struct hwloc_obj *root, struct hwloc_obj *child) +{ + if (child->next_sibling) + child->next_sibling->prev_sibling = child->prev_sibling; + else + root->last_child = child->prev_sibling; + if (child->prev_sibling) + child->prev_sibling->next_sibling = child->next_sibling; + else + root->first_child = child->next_sibling; + child->prev_sibling = NULL; + child->next_sibling = NULL; +} + +static void hwloc_pci_add_object(struct hwloc_obj *root, struct hwloc_obj *new); + +static void +hwloc_pci_try_insert_siblings_below_new_bridge(struct hwloc_obj *root, struct hwloc_obj *new) +{ + enum hwloc_pci_busid_comparison_e comp; + struct hwloc_obj *current, *next; + + next = new->next_sibling; + while (next) { + current = next; + next = current->next_sibling; + + comp = hwloc_pci_compare_busids(current, new); + assert(comp != HWLOC_PCI_BUSID_SUPERSET); + if (comp == HWLOC_PCI_BUSID_HIGHER) + continue; + assert(comp == HWLOC_PCI_BUSID_INCLUDED); + + /* move this object below the new bridge */ + hwloc_pci_remove_child(root, current); + hwloc_pci_add_object(new, current); + } +} + +static void +hwloc_pci_add_object(struct hwloc_obj *root, struct hwloc_obj *new) +{ + struct hwloc_obj *current; + + current = root->first_child; + while (current) { + enum hwloc_pci_busid_comparison_e comp = hwloc_pci_compare_busids(new, current); + switch (comp) { + case HWLOC_PCI_BUSID_HIGHER: + /* go further */ + current = current->next_sibling; + continue; + case HWLOC_PCI_BUSID_INCLUDED: + /* insert below current bridge */ + hwloc_pci_add_object(current, new); + return; + case HWLOC_PCI_BUSID_LOWER: + case HWLOC_PCI_BUSID_SUPERSET: + /* insert before current object */ + hwloc_pci_add_child_before(root, current, new); + /* walk next siblings and move them below new bridge if needed */ + hwloc_pci_try_insert_siblings_below_new_bridge(root, new); + return; + } + } + /* add to the end of the list if higher than everybody */ + hwloc_pci_add_child_before(root, NULL, new); +} + +static struct hwloc_obj * +hwloc_pci_find_hostbridge_parent(struct hwloc_topology *topology, struct hwloc_backend *backend, + struct hwloc_obj *hostbridge) +{ + hwloc_bitmap_t cpuset = hwloc_bitmap_alloc(); + struct hwloc_obj *parent; + char *env; + int err; + + /* override the cpuset with the environment if given */ + char envname[256]; + snprintf(envname, sizeof(envname), "HWLOC_PCI_%04x_%02x_LOCALCPUS", + hostbridge->first_child->attr->pcidev.domain, hostbridge->first_child->attr->pcidev.bus); + env = getenv(envname); + if (env) { + /* force the hostbridge cpuset */ + hwloc_debug("Overriding localcpus using %s in the environment\n", envname); + hwloc_bitmap_sscanf(cpuset, env); + } else { + /* get the hostbridge cpuset by acking the OS backend. + * it's not a PCI device, so we use its first child locality info. + */ + err = hwloc_backends_get_obj_cpuset(backend, hostbridge->first_child, cpuset); + if (err < 0) + /* if we got nothing, assume the hostbridge is attached to the top of hierarchy */ + hwloc_bitmap_copy(cpuset, hwloc_topology_get_topology_cpuset(topology)); + } + + hwloc_debug_bitmap("Attaching hostbridge to cpuset %s\n", cpuset); + + /* restrict to the existing topology cpuset to avoid errors later */ + hwloc_bitmap_and(cpuset, cpuset, hwloc_topology_get_topology_cpuset(topology)); + + /* if the remaining cpuset is empty, take the root */ + if (hwloc_bitmap_iszero(cpuset)) + hwloc_bitmap_copy(cpuset, hwloc_topology_get_topology_cpuset(topology)); + + /* attach the hostbridge now that it contains the right objects */ + parent = hwloc_get_obj_covering_cpuset(topology, cpuset); + /* in the worst case, we got the root object */ + + if (hwloc_bitmap_isequal(cpuset, parent->cpuset)) { + /* this object has the right cpuset, but it could be a cache or so, + * go up as long as the cpuset is the same + */ + while (parent->parent && hwloc_bitmap_isequal(parent->cpuset, parent->parent->cpuset)) + parent = parent->parent; + } else { + /* the object we found is too large, insert an intermediate group */ + hwloc_obj_t group_obj = hwloc_alloc_setup_object(HWLOC_OBJ_GROUP, -1); + if (group_obj) { + group_obj->cpuset = hwloc_bitmap_dup(cpuset); + group_obj->attr->group.depth = (unsigned) -1; + parent = hwloc__insert_object_by_cpuset(topology, group_obj, hwloc_report_os_error); + if (parent == group_obj) + /* if didn't get merged, setup its sets */ + hwloc_fill_object_sets(group_obj); + } + } + + hwloc_bitmap_free(cpuset); + + return parent; +} + +int +hwloc_insert_pci_device_list(struct hwloc_backend *backend, + struct hwloc_obj *first_obj) +{ + struct hwloc_topology *topology = backend->topology; + struct hwloc_obj fakeparent; + struct hwloc_obj *obj; + unsigned current_hostbridge; + + if (!first_obj) + /* found nothing, exit */ + return 0; + + /* first, organise object as tree under a fake parent object */ + fakeparent.first_child = NULL; + fakeparent.last_child = NULL; + while (first_obj) { + obj = first_obj; + first_obj = obj->next_sibling; + hwloc_pci_add_object(&fakeparent, obj); + } + + hwloc_debug("%s", "\nPCI hierarchy under fake parent:\n"); + hwloc_pci_traverse(NULL, &fakeparent, hwloc_pci_traverse_print_cb); + + /* walk the hierarchy, set bridge depth and lookup OS devices */ + hwloc_pci_traverse(NULL, &fakeparent, hwloc_pci_traverse_setbridgedepth_cb); + hwloc_pci_traverse(backend, &fakeparent, hwloc_pci_traverse_lookuposdevices_cb); + + /* + * fakeparent lists all objects connected to any upstream bus in the machine. + * We now create one real hostbridge object per upstream bus. + * It's not actually a PCI device so we have to create it. + */ + current_hostbridge = 0; + while (fakeparent.first_child) { + /* start a new host bridge */ + struct hwloc_obj *hostbridge = hwloc_alloc_setup_object(HWLOC_OBJ_BRIDGE, current_hostbridge++); + struct hwloc_obj *child = fakeparent.first_child; + struct hwloc_obj *next_child; + struct hwloc_obj *parent; + unsigned short current_domain = child->attr->pcidev.domain; + unsigned char current_bus = child->attr->pcidev.bus; + unsigned char current_subordinate = current_bus; + + hwloc_debug("Starting new PCI hostbridge %04x:%02x\n", current_domain, current_bus); + + /* + * attach all objects from the same upstream domain/bus + */ + next_child: + next_child = child->next_sibling; + hwloc_pci_remove_child(&fakeparent, child); + hwloc_pci_add_child_before(hostbridge, NULL, child); + + /* compute hostbridge secondary/subordinate buses */ + if (child->type == HWLOC_OBJ_BRIDGE + && child->attr->bridge.downstream.pci.subordinate_bus > current_subordinate) + current_subordinate = child->attr->bridge.downstream.pci.subordinate_bus; + + /* use next child if it has the same domains/bus */ + child = next_child; + if (child + && child->attr->pcidev.domain == current_domain + && child->attr->pcidev.bus == current_bus) + goto next_child; + + /* finish setting up this hostbridge */ + hostbridge->attr->bridge.upstream_type = HWLOC_OBJ_BRIDGE_HOST; + hostbridge->attr->bridge.downstream_type = HWLOC_OBJ_BRIDGE_PCI; + hostbridge->attr->bridge.downstream.pci.domain = current_domain; + hostbridge->attr->bridge.downstream.pci.secondary_bus = current_bus; + hostbridge->attr->bridge.downstream.pci.subordinate_bus = current_subordinate; + hwloc_debug("New PCI hostbridge %04x:[%02x-%02x]\n", + current_domain, current_bus, current_subordinate); + + /* attach the hostbridge where it belongs */ + parent = hwloc_pci_find_hostbridge_parent(topology, backend, hostbridge); + hwloc_insert_object_by_parent(topology, parent, hostbridge); + } + + return 1; +} + +#define HWLOC_PCI_STATUS 0x06 +#define HWLOC_PCI_STATUS_CAP_LIST 0x10 +#define HWLOC_PCI_CAPABILITY_LIST 0x34 +#define HWLOC_PCI_CAP_LIST_ID 0 +#define HWLOC_PCI_CAP_LIST_NEXT 1 + +unsigned +hwloc_pci_find_cap(const unsigned char *config, unsigned cap) +{ + unsigned char seen[256] = { 0 }; + unsigned char ptr; /* unsigned char to make sure we stay within the 256-byte config space */ + + if (!(config[HWLOC_PCI_STATUS] & HWLOC_PCI_STATUS_CAP_LIST)) + return 0; + + for (ptr = config[HWLOC_PCI_CAPABILITY_LIST] & ~3; + ptr; /* exit if next is 0 */ + ptr = config[ptr + HWLOC_PCI_CAP_LIST_NEXT] & ~3) { + unsigned char id; + + /* Looped around! */ + if (seen[ptr]) + break; + seen[ptr] = 1; + + id = config[ptr + HWLOC_PCI_CAP_LIST_ID]; + if (id == cap) + return ptr; + if (id == 0xff) /* exit if id is 0 or 0xff */ + break; + } + return 0; +} + +#define HWLOC_PCI_EXP_LNKSTA 0x12 +#define HWLOC_PCI_EXP_LNKSTA_SPEED 0x000f +#define HWLOC_PCI_EXP_LNKSTA_WIDTH 0x03f0 + +int +hwloc_pci_find_linkspeed(const unsigned char *config, + unsigned offset, float *linkspeed) +{ + unsigned linksta, speed, width; + float lanespeed; + + memcpy(&linksta, &config[offset + HWLOC_PCI_EXP_LNKSTA], 4); + speed = linksta & HWLOC_PCI_EXP_LNKSTA_SPEED; /* PCIe generation */ + width = (linksta & HWLOC_PCI_EXP_LNKSTA_WIDTH) >> 4; /* how many lanes */ + /* PCIe Gen1 = 2.5GT/s signal-rate per lane with 8/10 encoding = 0.25GB/s data-rate per lane + * PCIe Gen2 = 5 GT/s signal-rate per lane with 8/10 encoding = 0.5 GB/s data-rate per lane + * PCIe Gen3 = 8 GT/s signal-rate per lane with 128/130 encoding = 1 GB/s data-rate per lane + */ + lanespeed = speed <= 2 ? 2.5 * speed * 0.8 : 8.0 * 128/130; /* Gbit/s per lane */ + *linkspeed = lanespeed * width / 8; /* GB/s */ + return 0; +} + +#define HWLOC_PCI_HEADER_TYPE 0x0e +#define HWLOC_PCI_HEADER_TYPE_BRIDGE 1 +#define HWLOC_PCI_CLASS_BRIDGE_PCI 0x0604 +#define HWLOC_PCI_PRIMARY_BUS 0x18 +#define HWLOC_PCI_SECONDARY_BUS 0x19 +#define HWLOC_PCI_SUBORDINATE_BUS 0x1a + +int +hwloc_pci_prepare_bridge(hwloc_obj_t obj, + const unsigned char *config) +{ + unsigned char headertype; + unsigned isbridge; + struct hwloc_pcidev_attr_s *pattr = &obj->attr->pcidev; + struct hwloc_bridge_attr_s *battr; + + headertype = config[HWLOC_PCI_HEADER_TYPE] & 0x7f; + isbridge = (pattr->class_id == HWLOC_PCI_CLASS_BRIDGE_PCI + && headertype == HWLOC_PCI_HEADER_TYPE_BRIDGE); + + if (!isbridge) + return 0; + + battr = &obj->attr->bridge; + + if (config[HWLOC_PCI_PRIMARY_BUS] != pattr->bus) + hwloc_debug(" %04x:%02x:%02x.%01x bridge with (ignored) invalid PCI_PRIMARY_BUS %02x\n", + pattr->domain, pattr->bus, pattr->dev, pattr->func, config[HWLOC_PCI_PRIMARY_BUS]); + + obj->type = HWLOC_OBJ_BRIDGE; + battr->upstream_type = HWLOC_OBJ_BRIDGE_PCI; + battr->downstream_type = HWLOC_OBJ_BRIDGE_PCI; + battr->downstream.pci.domain = pattr->domain; + battr->downstream.pci.secondary_bus = config[HWLOC_PCI_SECONDARY_BUS]; + battr->downstream.pci.subordinate_bus = config[HWLOC_PCI_SUBORDINATE_BUS]; + + return 0; +} diff --git a/ext/hwloc/src/topology-bgq.cb b/ext/hwloc/src/topology-bgq.cb new file mode 100644 index 000000000..5a2e61127 --- /dev/null +++ b/ext/hwloc/src/topology-bgq.cb @@ -0,0 +1,239 @@ +/* + * Copyright © 2013 Inria. All rights reserved. + * See COPYING in top-level directory. + */ + +#include + +#include +#include +#include + +#include +#include +#include +#include +#include + +static int +hwloc_look_bgq(struct hwloc_backend *backend) +{ + struct hwloc_topology *topology = backend->topology; + unsigned i; + char *env; + + if (!topology->levels[0][0]->cpuset) { + /* Nobody created objects yet, setup everything */ + hwloc_bitmap_t set; + hwloc_obj_t obj; + +#define HWLOC_BGQ_CORES 17 /* spare core ignored for now */ + + hwloc_alloc_obj_cpusets(topology->levels[0][0]); + /* mark the 17th core (OS-reserved) as disallowed */ + hwloc_bitmap_clr_range(topology->levels[0][0]->allowed_cpuset, (HWLOC_BGQ_CORES-1)*4, HWLOC_BGQ_CORES*4-1); + + env = getenv("BG_THREADMODEL"); + if (!env || atoi(env) != 2) { + /* process cannot use cores/threads outside of its Kernel_ThreadMask() */ + uint64_t bgmask = Kernel_ThreadMask(Kernel_MyTcoord()); + /* the mask is reversed, manually reverse it */ + for(i=0; i<64; i++) + if (((bgmask >> i) & 1) == 0) + hwloc_bitmap_clr(topology->levels[0][0]->allowed_cpuset, 63-i); + } + + /* a single memory bank */ + set = hwloc_bitmap_alloc(); + hwloc_bitmap_set(set, 0); + topology->levels[0][0]->nodeset = set; + topology->levels[0][0]->memory.local_memory = 16ULL*1024*1024*1024ULL; + + /* socket */ + obj = hwloc_alloc_setup_object(HWLOC_OBJ_SOCKET, 0); + set = hwloc_bitmap_alloc(); + hwloc_bitmap_set_range(set, 0, HWLOC_BGQ_CORES*4-1); + obj->cpuset = set; + hwloc_obj_add_info(obj, "CPUModel", "IBM PowerPC A2"); + hwloc_insert_object_by_cpuset(topology, obj); + + /* shared L2 */ + obj = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); + obj->cpuset = hwloc_bitmap_dup(set); + obj->attr->cache.type = HWLOC_OBJ_CACHE_UNIFIED; + obj->attr->cache.depth = 2; + obj->attr->cache.size = 32*1024*1024; + obj->attr->cache.linesize = 128; + obj->attr->cache.associativity = 16; + hwloc_insert_object_by_cpuset(topology, obj); + + /* Cores */ + for(i=0; icpuset = set; + hwloc_insert_object_by_cpuset(topology, obj); + /* L1d */ + obj = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); + obj->cpuset = hwloc_bitmap_dup(set); + obj->attr->cache.type = HWLOC_OBJ_CACHE_DATA; + obj->attr->cache.depth = 1; + obj->attr->cache.size = 16*1024; + obj->attr->cache.linesize = 64; + obj->attr->cache.associativity = 8; + hwloc_insert_object_by_cpuset(topology, obj); + /* L1i */ + obj = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); + obj->cpuset = hwloc_bitmap_dup(set); + obj->attr->cache.type = HWLOC_OBJ_CACHE_INSTRUCTION; + obj->attr->cache.depth = 1; + obj->attr->cache.size = 16*1024; + obj->attr->cache.linesize = 64; + obj->attr->cache.associativity = 4; + hwloc_insert_object_by_cpuset(topology, obj); + /* there's also a L1p "prefetch cache" of 4kB with 128B lines */ + } + + /* PUs */ + hwloc_setup_pu_level(topology, HWLOC_BGQ_CORES*4); + } + + /* Add BGQ specific information */ + + hwloc_obj_add_info(topology->levels[0][0], "Backend", "BGQ"); + if (topology->is_thissystem) + hwloc_add_uname_info(topology); + return 1; +} + +static int +hwloc_bgq_get_thread_cpubind(hwloc_topology_t topology, pthread_t thread, hwloc_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) +{ + unsigned pu; + cpu_set_t bg_set; + int err; + + if (topology->pid) { + errno = ENOSYS; + return -1; + } + err = pthread_getaffinity_np(thread, sizeof(bg_set), &bg_set); + if (err) { + errno = err; + return -1; + } + for(pu=0; pu<64; pu++) + if (CPU_ISSET(pu, &bg_set)) { + /* the binding cannot contain multiple PUs */ + hwloc_bitmap_only(hwloc_set, pu); + break; + } + return 0; +} + +static int +hwloc_bgq_get_thisthread_cpubind(hwloc_topology_t topology, hwloc_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) +{ + if (topology->pid) { + errno = ENOSYS; + return -1; + } + hwloc_bitmap_only(hwloc_set, Kernel_ProcessorID()); + return 0; +} + +static int +hwloc_bgq_set_thread_cpubind(hwloc_topology_t topology, pthread_t thread, hwloc_const_bitmap_t hwloc_set, int flags) +{ + unsigned pu; + cpu_set_t bg_set; + int err; + + if (topology->pid) { + errno = ENOSYS; + return -1; + } + /* the binding cannot contain multiple PUs. + * keep the first PU only, and error out if STRICT. + */ + if (hwloc_bitmap_weight(hwloc_set) != 1) { + if ((flags & HWLOC_CPUBIND_STRICT)) { + errno = ENOSYS; + return -1; + } + } + pu = hwloc_bitmap_first(hwloc_set); + CPU_ZERO(&bg_set); + CPU_SET(pu, &bg_set); + err = pthread_setaffinity_np(thread, sizeof(bg_set), &bg_set); + if (err) { + errno = err; + return -1; + } + return 0; +} + +static int +hwloc_bgq_set_thisthread_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t hwloc_set, int flags) +{ + return hwloc_bgq_set_thread_cpubind(topology, pthread_self(), hwloc_set, flags); +} + +void +hwloc_set_bgq_hooks(struct hwloc_binding_hooks *hooks __hwloc_attribute_unused, + struct hwloc_topology_support *support __hwloc_attribute_unused) +{ + hooks->set_thisthread_cpubind = hwloc_bgq_set_thisthread_cpubind; + hooks->set_thread_cpubind = hwloc_bgq_set_thread_cpubind; + hooks->get_thisthread_cpubind = hwloc_bgq_get_thisthread_cpubind; + hooks->get_thread_cpubind = hwloc_bgq_get_thread_cpubind; + /* threads cannot be bound to more than one PU, so get_last_cpu_location == get_cpubind */ + hooks->get_thisthread_last_cpu_location = hwloc_bgq_get_thisthread_cpubind; + /* hooks->get_thread_last_cpu_location = hwloc_bgq_get_thread_cpubind; */ +} + +static struct hwloc_backend * +hwloc_bgq_component_instantiate(struct hwloc_disc_component *component, + const void *_data1 __hwloc_attribute_unused, + const void *_data2 __hwloc_attribute_unused, + const void *_data3 __hwloc_attribute_unused) +{ + struct utsname utsname; + struct hwloc_backend *backend; + char *env; + int err; + + env = getenv("HWLOC_FORCE_BGQ"); + if (!env || !atoi(env)) { + err = uname(&utsname); + if (err || strcmp(utsname.sysname, "CNK") || strcmp(utsname.machine, "BGQ")) { + fprintf(stderr, "*** Found unexpected uname sysname `%s' machine `%s', disabling BGQ backend.\n", utsname.sysname, utsname.machine); + fprintf(stderr, "*** Set HWLOC_FORCE_BGQ=1 in the environment to enforce the BGQ backend.\n"); + return NULL; + } + } + + backend = hwloc_backend_alloc(component); + if (!backend) + return NULL; + backend->discover = hwloc_look_bgq; + return backend; +} + +static struct hwloc_disc_component hwloc_bgq_disc_component = { + HWLOC_DISC_COMPONENT_TYPE_GLOBAL, + "bgq", + ~0, + hwloc_bgq_component_instantiate, + 50, + NULL +}; + +const struct hwloc_component hwloc_bgq_component = { + HWLOC_COMPONENT_ABI, + HWLOC_COMPONENT_TYPE_DISC, + 0, + &hwloc_bgq_disc_component +}; diff --git a/ext/hwloc/src/topology-custom.c b/ext/hwloc/src/topology-custom.c new file mode 100644 index 000000000..23077bfde --- /dev/null +++ b/ext/hwloc/src/topology-custom.c @@ -0,0 +1,99 @@ +/* + * Copyright © 2011-2012 Inria. All rights reserved. + * See COPYING in top-level directory. + */ + +#include +#include +#include + +hwloc_obj_t +hwloc_custom_insert_group_object_by_parent(struct hwloc_topology *topology, hwloc_obj_t parent, int groupdepth) +{ + hwloc_obj_t obj; + + /* must be called between set_custom() and load(), so there's a single backend, the custom one */ + if (topology->is_loaded || !topology->backends || !topology->backends->is_custom) { + errno = EINVAL; + return NULL; + } + + obj = hwloc_alloc_setup_object(HWLOC_OBJ_GROUP, -1); + obj->attr->group.depth = groupdepth; + hwloc_obj_add_info(obj, "Backend", "Custom"); + hwloc_insert_object_by_parent(topology, parent, obj); + /* insert_object_by_parent() doesn't merge during insert, so obj is still valid */ + + return obj; +} + +int +hwloc_custom_insert_topology(struct hwloc_topology *newtopology, + struct hwloc_obj *newparent, + struct hwloc_topology *oldtopology, + struct hwloc_obj *oldroot) +{ + /* must be called between set_custom() and load(), so there's a single backend, the custom one */ + if (newtopology->is_loaded || !newtopology->backends || !newtopology->backends->is_custom) { + errno = EINVAL; + return -1; + } + + if (!oldtopology->is_loaded) { + errno = EINVAL; + return -1; + } + + hwloc__duplicate_objects(newtopology, newparent, oldroot ? oldroot : oldtopology->levels[0][0]); + return 0; +} + +static int +hwloc_look_custom(struct hwloc_backend *backend) +{ + struct hwloc_topology *topology = backend->topology; + hwloc_obj_t root = topology->levels[0][0]; + + assert(!root->cpuset); + + if (!root->first_child) { + errno = EINVAL; + return -1; + } + + root->type = HWLOC_OBJ_SYSTEM; + hwloc_obj_add_info(root, "Backend", "Custom"); + return 1; +} + +static struct hwloc_backend * +hwloc_custom_component_instantiate(struct hwloc_disc_component *component, + const void *_data1 __hwloc_attribute_unused, + const void *_data2 __hwloc_attribute_unused, + const void *_data3 __hwloc_attribute_unused) +{ + struct hwloc_backend *backend; + backend = hwloc_backend_alloc(component); + if (!backend) + return NULL; + backend->discover = hwloc_look_custom; + backend->is_custom = 1; + backend->is_thissystem = 0; + return backend; +} + +static struct hwloc_disc_component hwloc_custom_disc_component = { + HWLOC_DISC_COMPONENT_TYPE_GLOBAL, + "custom", + ~0, + hwloc_custom_component_instantiate, + 30, + NULL +}; + +const struct hwloc_component hwloc_custom_component = { + HWLOC_COMPONENT_ABI, + HWLOC_COMPONENT_TYPE_DISC, + 0, + &hwloc_custom_disc_component +}; diff --git a/ext/hwloc/src/topology-darwin.cb b/ext/hwloc/src/topology-darwin.cb new file mode 100644 index 000000000..39e13a3f3 --- /dev/null +++ b/ext/hwloc/src/topology-darwin.cb @@ -0,0 +1,306 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2012 Inria. All rights reserved. + * Copyright © 2009-2013 Université Bordeaux 1 + * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +/* Detect topology change: registering for power management changes and check + * if for example hw.activecpu changed */ + +/* Apparently, Darwin people do not _want_ to provide binding functions. */ + +#include + +#include +#include +#include +#include + +#include +#include +#include + +static int +hwloc_look_darwin(struct hwloc_backend *backend) +{ + struct hwloc_topology *topology = backend->topology; + int64_t _nprocs; + unsigned nprocs; + int64_t _npackages; + unsigned i, j, cpu; + struct hwloc_obj *obj; + size_t size; + int64_t l1dcachesize, l1icachesize; + int64_t cacheways[2]; + int64_t l2cachesize; + int64_t cachelinesize; + int64_t memsize; + char cpumodel[64]; + + if (topology->levels[0][0]->cpuset) + /* somebody discovered things */ + return 0; + + hwloc_alloc_obj_cpusets(topology->levels[0][0]); + + if (hwloc_get_sysctlbyname("hw.ncpu", &_nprocs) || _nprocs <= 0) + return -1; + nprocs = _nprocs; + topology->support.discovery->pu = 1; + + hwloc_debug("%u procs\n", nprocs); + + size = sizeof(cpumodel); + if (sysctlbyname("machdep.cpu.brand_string", cpumodel, &size, NULL, 0)) + cpumodel[0] = '\0'; + + if (!hwloc_get_sysctlbyname("hw.packages", &_npackages) && _npackages > 0) { + unsigned npackages = _npackages; + int64_t _cores_per_package; + int64_t _logical_per_package; + unsigned logical_per_package; + + hwloc_debug("%u packages\n", npackages); + + if (!hwloc_get_sysctlbyname("machdep.cpu.logical_per_package", &_logical_per_package) && _logical_per_package > 0) + logical_per_package = _logical_per_package; + else + /* Assume the trivia. */ + logical_per_package = nprocs / npackages; + + hwloc_debug("%u threads per package\n", logical_per_package); + + + if (nprocs == npackages * logical_per_package) + for (i = 0; i < npackages; i++) { + obj = hwloc_alloc_setup_object(HWLOC_OBJ_SOCKET, i); + obj->cpuset = hwloc_bitmap_alloc(); + for (cpu = i*logical_per_package; cpu < (i+1)*logical_per_package; cpu++) + hwloc_bitmap_set(obj->cpuset, cpu); + + hwloc_debug_1arg_bitmap("package %u has cpuset %s\n", + i, obj->cpuset); + + if (cpumodel[0] != '\0') + hwloc_obj_add_info(obj, "CPUModel", cpumodel); + hwloc_insert_object_by_cpuset(topology, obj); + } + else + if (cpumodel[0] != '\0') + hwloc_obj_add_info(topology->levels[0][0], "CPUModel", cpumodel); + + if (!hwloc_get_sysctlbyname("machdep.cpu.cores_per_package", &_cores_per_package) && _cores_per_package > 0) { + unsigned cores_per_package = _cores_per_package; + hwloc_debug("%u cores per package\n", cores_per_package); + + if (!(logical_per_package % cores_per_package)) + for (i = 0; i < npackages * cores_per_package; i++) { + obj = hwloc_alloc_setup_object(HWLOC_OBJ_CORE, i); + obj->cpuset = hwloc_bitmap_alloc(); + for (cpu = i*(logical_per_package/cores_per_package); + cpu < (i+1)*(logical_per_package/cores_per_package); + cpu++) + hwloc_bitmap_set(obj->cpuset, cpu); + + hwloc_debug_1arg_bitmap("core %u has cpuset %s\n", + i, obj->cpuset); + hwloc_insert_object_by_cpuset(topology, obj); + } + } + } else + if (cpumodel[0] != '\0') + hwloc_obj_add_info(topology->levels[0][0], "CPUModel", cpumodel); + + if (hwloc_get_sysctlbyname("hw.l1dcachesize", &l1dcachesize)) + l1dcachesize = 0; + + if (hwloc_get_sysctlbyname("hw.l1icachesize", &l1icachesize)) + l1icachesize = 0; + + if (hwloc_get_sysctlbyname("hw.l2cachesize", &l2cachesize)) + l2cachesize = 0; + + if (hwloc_get_sysctlbyname("machdep.cpu.cache.L1_associativity", &cacheways[0])) + cacheways[0] = 0; + else if (cacheways[0] == 0xff) + cacheways[0] = -1; + + if (hwloc_get_sysctlbyname("machdep.cpu.cache.L2_associativity", &cacheways[1])) + cacheways[1] = 0; + else if (cacheways[1] == 0xff) + cacheways[1] = -1; + + if (hwloc_get_sysctlbyname("hw.cachelinesize", &cachelinesize)) + cachelinesize = 0; + + if (hwloc_get_sysctlbyname("hw.memsize", &memsize)) + memsize = 0; + + if (!sysctlbyname("hw.cacheconfig", NULL, &size, NULL, 0)) { + unsigned n = size / sizeof(uint32_t); + uint64_t *cacheconfig = NULL; + uint64_t *cachesize = NULL; + uint32_t *cacheconfig32 = NULL; + + cacheconfig = malloc(sizeof(uint64_t) * n); + if (NULL == cacheconfig) { + goto out; + } + cachesize = malloc(sizeof(uint64_t) * n); + if (NULL == cachesize) { + goto out; + } + cacheconfig32 = malloc(sizeof(uint32_t) * n); + if (NULL == cacheconfig32) { + goto out; + } + + if ((!sysctlbyname("hw.cacheconfig", cacheconfig, &size, NULL, 0))) { + /* Yeech. Darwin seemingly has changed from 32bit to 64bit integers for + * cacheconfig, with apparently no way for detection. Assume the machine + * won't have more than 4 billion cpus */ + if (cacheconfig[0] > 0xFFFFFFFFUL) { + memcpy(cacheconfig32, cacheconfig, size); + for (i = 0 ; i < size / sizeof(uint32_t); i++) + cacheconfig[i] = cacheconfig32[i]; + } + + memset(cachesize, 0, sizeof(uint64_t) * n); + size = sizeof(uint64_t) * n; + if (sysctlbyname("hw.cachesize", cachesize, &size, NULL, 0)) { + if (n > 0) + cachesize[0] = memsize; + if (n > 1) + cachesize[1] = l1dcachesize; + if (n > 2) + cachesize[2] = l2cachesize; + } + + hwloc_debug("%s", "caches"); + for (i = 0; i < n && cacheconfig[i]; i++) + hwloc_debug(" %"PRIu64"(%"PRIu64"kB)", cacheconfig[i], cachesize[i] / 1024); + + /* Now we know how many caches there are */ + n = i; + hwloc_debug("\n%u cache levels\n", n - 1); + + /* For each cache level (0 is memory) */ + for (i = 0; i < n; i++) { + /* cacheconfig tells us how many cpus share it, let's iterate on each cache */ + for (j = 0; j < (nprocs / cacheconfig[i]); j++) { + obj = hwloc_alloc_setup_object(i?HWLOC_OBJ_CACHE:HWLOC_OBJ_NODE, j); + if (!i) { + obj->nodeset = hwloc_bitmap_alloc(); + hwloc_bitmap_set(obj->nodeset, j); + } + obj->cpuset = hwloc_bitmap_alloc(); + for (cpu = j*cacheconfig[i]; + cpu < ((j+1)*cacheconfig[i]); + cpu++) + hwloc_bitmap_set(obj->cpuset, cpu); + + if (i == 1 && l1icachesize) { + /* FIXME assuming that L1i and L1d are shared the same way. Darwin + * does not yet provide a way to know. */ + hwloc_obj_t l1i = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, j); + l1i->cpuset = hwloc_bitmap_dup(obj->cpuset); + hwloc_debug_1arg_bitmap("L1icache %u has cpuset %s\n", + j, l1i->cpuset); + l1i->attr->cache.depth = i; + l1i->attr->cache.size = l1icachesize; + l1i->attr->cache.linesize = cachelinesize; + l1i->attr->cache.associativity = 0; + l1i->attr->cache.type = HWLOC_OBJ_CACHE_INSTRUCTION; + + hwloc_insert_object_by_cpuset(topology, l1i); + } + if (i) { + hwloc_debug_2args_bitmap("L%ucache %u has cpuset %s\n", + i, j, obj->cpuset); + obj->attr->cache.depth = i; + obj->attr->cache.size = cachesize[i]; + obj->attr->cache.linesize = cachelinesize; + if (i <= sizeof(cacheways) / sizeof(cacheways[0])) + obj->attr->cache.associativity = cacheways[i-1]; + else + obj->attr->cache.associativity = 0; + if (i == 1 && l1icachesize) + obj->attr->cache.type = HWLOC_OBJ_CACHE_DATA; + else + obj->attr->cache.type = HWLOC_OBJ_CACHE_UNIFIED; + } else { + hwloc_debug_1arg_bitmap("node %u has cpuset %s\n", + j, obj->cpuset); + obj->memory.local_memory = cachesize[i]; + obj->memory.page_types_len = 2; + obj->memory.page_types = malloc(2*sizeof(*obj->memory.page_types)); + memset(obj->memory.page_types, 0, 2*sizeof(*obj->memory.page_types)); + obj->memory.page_types[0].size = hwloc_getpagesize(); +#ifdef HAVE__SC_LARGE_PAGESIZE + obj->memory.page_types[1].size = sysconf(_SC_LARGE_PAGESIZE); +#endif + } + + hwloc_insert_object_by_cpuset(topology, obj); + } + } + } + out: + if (NULL != cacheconfig) { + free(cacheconfig); + } + if (NULL != cachesize) { + free(cachesize); + } + if (NULL != cacheconfig32) { + free(cacheconfig32); + } + } + + + /* add PU objects */ + hwloc_setup_pu_level(topology, nprocs); + + hwloc_obj_add_info(topology->levels[0][0], "Backend", "Darwin"); + if (topology->is_thissystem) + hwloc_add_uname_info(topology); + return 1; +} + +void +hwloc_set_darwin_hooks(struct hwloc_binding_hooks *hooks __hwloc_attribute_unused, + struct hwloc_topology_support *support __hwloc_attribute_unused) +{ +} + +static struct hwloc_backend * +hwloc_darwin_component_instantiate(struct hwloc_disc_component *component, + const void *_data1 __hwloc_attribute_unused, + const void *_data2 __hwloc_attribute_unused, + const void *_data3 __hwloc_attribute_unused) +{ + struct hwloc_backend *backend; + backend = hwloc_backend_alloc(component); + if (!backend) + return NULL; + backend->discover = hwloc_look_darwin; + return backend; +} + +static struct hwloc_disc_component hwloc_darwin_disc_component = { + HWLOC_DISC_COMPONENT_TYPE_CPU, + "darwin", + HWLOC_DISC_COMPONENT_TYPE_GLOBAL, + hwloc_darwin_component_instantiate, + 50, + NULL +}; + +const struct hwloc_component hwloc_darwin_component = { + HWLOC_COMPONENT_ABI, + HWLOC_COMPONENT_TYPE_DISC, + 0, + &hwloc_darwin_disc_component +}; diff --git a/ext/hwloc/src/topology-fake.c b/ext/hwloc/src/topology-fake.c new file mode 100644 index 000000000..cc50d31d7 --- /dev/null +++ b/ext/hwloc/src/topology-fake.c @@ -0,0 +1,41 @@ +/* + * Copyright © 2012 Inria. All rights reserved. + * See COPYING in top-level directory. + */ + +#include +#include +#include + +#include + +static struct hwloc_backend * +hwloc_fake_component_instantiate(struct hwloc_disc_component *component __hwloc_attribute_unused, + const void *_data1 __hwloc_attribute_unused, + const void *_data2 __hwloc_attribute_unused, + const void *_data3 __hwloc_attribute_unused) +{ + if (hwloc_plugin_check_namespace("fake", "hwloc_backend_alloc") < 0) + return NULL; + if (getenv("HWLOC_DEBUG_FAKE_COMPONENT")) + printf("fake component instantiated\n"); + return NULL; +} + +static struct hwloc_disc_component hwloc_fake_disc_component = { + HWLOC_DISC_COMPONENT_TYPE_MISC, /* so that it's always enabled when using the OS discovery */ + "fake", + 0, /* nothing to exclude */ + hwloc_fake_component_instantiate, + 100, /* make sure it's loaded before anything conflicting excludes it */ + NULL +}; + +HWLOC_DECLSPEC extern const struct hwloc_component hwloc_fake_component; /* never linked statically in the core */ + +const struct hwloc_component hwloc_fake_component = { + HWLOC_COMPONENT_ABI, + HWLOC_COMPONENT_TYPE_DISC, + 0, + &hwloc_fake_disc_component +}; diff --git a/ext/hwloc/src/topology-freebsd.cb b/ext/hwloc/src/topology-freebsd.cb new file mode 100644 index 000000000..7e13ca166 --- /dev/null +++ b/ext/hwloc/src/topology-freebsd.cb @@ -0,0 +1,250 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2013 Inria. All rights reserved. + * Copyright © 2009-2010, 2012 Université Bordeaux 1 + * Copyright © 2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +#include + +#include +#include +#include +#include +#include +#ifdef HAVE_PTHREAD_NP_H +#include +#endif +#ifdef HAVE_SYS_CPUSET_H +#include +#endif +#ifdef HAVE_SYS_SYSCTL_H +#include +#endif + +#include +#include +#include + +#if defined(HAVE_SYS_CPUSET_H) && defined(HAVE_CPUSET_SETAFFINITY) +static void +hwloc_freebsd_bsd2hwloc(hwloc_bitmap_t hwloc_cpuset, const cpuset_t *cset) +{ + unsigned cpu; + hwloc_bitmap_zero(hwloc_cpuset); + for (cpu = 0; cpu < CPU_SETSIZE; cpu++) + if (CPU_ISSET(cpu, cset)) + hwloc_bitmap_set(hwloc_cpuset, cpu); +} + +static void +hwloc_freebsd_hwloc2bsd(hwloc_const_bitmap_t hwloc_cpuset, cpuset_t *cset) +{ + unsigned cpu; + CPU_ZERO(cset); + for (cpu = 0; cpu < CPU_SETSIZE; cpu++) + if (hwloc_bitmap_isset(hwloc_cpuset, cpu)) + CPU_SET(cpu, cset); +} + +static int +hwloc_freebsd_set_sth_affinity(hwloc_topology_t topology __hwloc_attribute_unused, cpulevel_t level, cpuwhich_t which, id_t id, hwloc_const_bitmap_t hwloc_cpuset, int flags __hwloc_attribute_unused) +{ + cpuset_t cset; + + hwloc_freebsd_hwloc2bsd(hwloc_cpuset, &cset); + + if (cpuset_setaffinity(level, which, id, sizeof(cset), &cset)) + return -1; + + return 0; +} + +static int +hwloc_freebsd_get_sth_affinity(hwloc_topology_t topology __hwloc_attribute_unused, cpulevel_t level, cpuwhich_t which, id_t id, hwloc_bitmap_t hwloc_cpuset, int flags __hwloc_attribute_unused) +{ + cpuset_t cset; + + if (cpuset_getaffinity(level, which, id, sizeof(cset), &cset)) + return -1; + + hwloc_freebsd_bsd2hwloc(hwloc_cpuset, &cset); + return 0; +} + +static int +hwloc_freebsd_set_thisproc_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t hwloc_cpuset, int flags) +{ + return hwloc_freebsd_set_sth_affinity(topology, CPU_LEVEL_WHICH, CPU_WHICH_PID, -1, hwloc_cpuset, flags); +} + +static int +hwloc_freebsd_get_thisproc_cpubind(hwloc_topology_t topology, hwloc_bitmap_t hwloc_cpuset, int flags) +{ + return hwloc_freebsd_get_sth_affinity(topology, CPU_LEVEL_WHICH, CPU_WHICH_PID, -1, hwloc_cpuset, flags); +} + +static int +hwloc_freebsd_set_thisthread_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t hwloc_cpuset, int flags) +{ + return hwloc_freebsd_set_sth_affinity(topology, CPU_LEVEL_WHICH, CPU_WHICH_TID, -1, hwloc_cpuset, flags); +} + +static int +hwloc_freebsd_get_thisthread_cpubind(hwloc_topology_t topology, hwloc_bitmap_t hwloc_cpuset, int flags) +{ + return hwloc_freebsd_get_sth_affinity(topology, CPU_LEVEL_WHICH, CPU_WHICH_TID, -1, hwloc_cpuset, flags); +} + +static int +hwloc_freebsd_set_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_bitmap_t hwloc_cpuset, int flags) +{ + return hwloc_freebsd_set_sth_affinity(topology, CPU_LEVEL_WHICH, CPU_WHICH_PID, pid, hwloc_cpuset, flags); +} + +static int +hwloc_freebsd_get_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_bitmap_t hwloc_cpuset, int flags) +{ + return hwloc_freebsd_get_sth_affinity(topology, CPU_LEVEL_WHICH, CPU_WHICH_PID, pid, hwloc_cpuset, flags); +} + +#ifdef hwloc_thread_t + +#if HAVE_DECL_PTHREAD_SETAFFINITY_NP +#pragma weak pthread_setaffinity_np +static int +hwloc_freebsd_set_thread_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_thread_t tid, hwloc_const_bitmap_t hwloc_cpuset, int flags __hwloc_attribute_unused) +{ + int err; + cpuset_t cset; + + if (!pthread_setaffinity_np) { + errno = ENOSYS; + return -1; + } + + hwloc_freebsd_hwloc2bsd(hwloc_cpuset, &cset); + + err = pthread_setaffinity_np(tid, sizeof(cset), &cset); + + if (err) { + errno = err; + return -1; + } + + return 0; +} +#endif + +#if HAVE_DECL_PTHREAD_GETAFFINITY_NP +#pragma weak pthread_getaffinity_np +static int +hwloc_freebsd_get_thread_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_thread_t tid, hwloc_bitmap_t hwloc_cpuset, int flags __hwloc_attribute_unused) +{ + int err; + cpuset_t cset; + + if (!pthread_getaffinity_np) { + errno = ENOSYS; + return -1; + } + + err = pthread_getaffinity_np(tid, sizeof(cset), &cset); + + if (err) { + errno = err; + return -1; + } + + hwloc_freebsd_bsd2hwloc(hwloc_cpuset, &cset); + return 0; +} +#endif +#endif +#endif + +#if (defined HAVE_SYSCTL) && (defined HAVE_SYS_SYSCTL_H) +static void +hwloc_freebsd_node_meminfo_info(struct hwloc_topology *topology) +{ + int mib[2] = { CTL_HW, HW_PHYSMEM }; + size_t len = sizeof(topology->levels[0][0]->memory.local_memory); + sysctl(mib, 2, &topology->levels[0][0]->memory.local_memory, &len, NULL, 0); +} +#endif + +static int +hwloc_look_freebsd(struct hwloc_backend *backend) +{ + struct hwloc_topology *topology = backend->topology; + unsigned nbprocs = hwloc_fallback_nbprocessors(topology); + + if (!topology->levels[0][0]->cpuset) { + /* Nobody (even the x86 backend) created objects yet, setup basic objects */ + hwloc_alloc_obj_cpusets(topology->levels[0][0]); + hwloc_setup_pu_level(topology, nbprocs); + } + + /* Add FreeBSD specific information */ +#if (defined HAVE_SYSCTL) && (defined HAVE_SYS_SYSCTL_H) + hwloc_freebsd_node_meminfo_info(topology); +#endif + hwloc_obj_add_info(topology->levels[0][0], "Backend", "FreeBSD"); + if (topology->is_thissystem) + hwloc_add_uname_info(topology); + return 1; +} + +void +hwloc_set_freebsd_hooks(struct hwloc_binding_hooks *hooks __hwloc_attribute_unused, + struct hwloc_topology_support *support __hwloc_attribute_unused) +{ +#if defined(HAVE_SYS_CPUSET_H) && defined(HAVE_CPUSET_SETAFFINITY) + hooks->set_thisproc_cpubind = hwloc_freebsd_set_thisproc_cpubind; + hooks->get_thisproc_cpubind = hwloc_freebsd_get_thisproc_cpubind; + hooks->set_thisthread_cpubind = hwloc_freebsd_set_thisthread_cpubind; + hooks->get_thisthread_cpubind = hwloc_freebsd_get_thisthread_cpubind; + hooks->set_proc_cpubind = hwloc_freebsd_set_proc_cpubind; + hooks->get_proc_cpubind = hwloc_freebsd_get_proc_cpubind; +#ifdef hwloc_thread_t +#if HAVE_DECL_PTHREAD_SETAFFINITY_NP + hooks->set_thread_cpubind = hwloc_freebsd_set_thread_cpubind; +#endif +#if HAVE_DECL_PTHREAD_GETAFFINITY_NP + hooks->get_thread_cpubind = hwloc_freebsd_get_thread_cpubind; +#endif +#endif +#endif + /* TODO: get_last_cpu_location: find out ki_lastcpu */ +} + +static struct hwloc_backend * +hwloc_freebsd_component_instantiate(struct hwloc_disc_component *component, + const void *_data1 __hwloc_attribute_unused, + const void *_data2 __hwloc_attribute_unused, + const void *_data3 __hwloc_attribute_unused) +{ + struct hwloc_backend *backend; + backend = hwloc_backend_alloc(component); + if (!backend) + return NULL; + backend->discover = hwloc_look_freebsd; + return backend; +} + +static struct hwloc_disc_component hwloc_freebsd_disc_component = { + HWLOC_DISC_COMPONENT_TYPE_CPU, + "freebsd", + HWLOC_DISC_COMPONENT_TYPE_GLOBAL, + hwloc_freebsd_component_instantiate, + 50, + NULL +}; + +const struct hwloc_component hwloc_freebsd_component = { + HWLOC_COMPONENT_ABI, + HWLOC_COMPONENT_TYPE_DISC, + 0, + &hwloc_freebsd_disc_component +}; diff --git a/ext/hwloc/src/topology-linux.c b/ext/hwloc/src/topology-linux.c new file mode 100644 index 000000000..b49f10912 --- /dev/null +++ b/ext/hwloc/src/topology-linux.c @@ -0,0 +1,4606 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2014 Inria. All rights reserved. + * Copyright © 2009-2013 Université Bordeaux 1 + * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. + * Copyright © 2010 IBM + * See COPYING in top-level directory. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#ifdef HAVE_DIRENT_H +#include +#endif +#ifdef HAVE_UNISTD_H +#include +#endif +#include +#include +#include +#include +#include +#include +#if defined HWLOC_HAVE_SET_MEMPOLICY || defined HWLOC_HAVE_MBIND +#define migratepages migrate_pages /* workaround broken migratepages prototype in numaif.h before libnuma 2.0.2 */ +#include +#endif + +struct hwloc_linux_backend_data_s { + int root_fd; /* The file descriptor for the file system root, used when browsing, e.g., Linux' sysfs and procfs. */ + int is_real_fsroot; /* Boolean saying whether root_fd points to the real filesystem root of the system */ + int deprecated_classlinks_model; /* -2 if never tried, -1 if unknown, 0 if new (device contains class/name), 1 if old (device contains class:name) */ + int mic_need_directlookup; /* if not tried yet, 0 if not needed, 1 if needed */ + unsigned mic_directlookup_id_max; /* -1 if not tried yet, 0 if none to lookup, maxid+1 otherwise */ +}; + + + +/*************************** + * Misc Abstraction layers * + ***************************/ + +#if !(defined HWLOC_HAVE_SCHED_SETAFFINITY) && (defined HWLOC_HAVE__SYSCALL3) +/* libc doesn't have support for sched_setaffinity, build system call + * ourselves: */ +# include +# ifndef __NR_sched_setaffinity +# ifdef __i386__ +# define __NR_sched_setaffinity 241 +# elif defined(__x86_64__) +# define __NR_sched_setaffinity 203 +# elif defined(__ia64__) +# define __NR_sched_setaffinity 1231 +# elif defined(__hppa__) +# define __NR_sched_setaffinity 211 +# elif defined(__alpha__) +# define __NR_sched_setaffinity 395 +# elif defined(__s390__) +# define __NR_sched_setaffinity 239 +# elif defined(__sparc__) +# define __NR_sched_setaffinity 261 +# elif defined(__m68k__) +# define __NR_sched_setaffinity 311 +# elif defined(__powerpc__) || defined(__ppc__) || defined(__PPC__) || defined(__powerpc64__) || defined(__ppc64__) +# define __NR_sched_setaffinity 222 +# elif defined(__arm__) +# define __NR_sched_setaffinity 241 +# elif defined(__cris__) +# define __NR_sched_setaffinity 241 +/*# elif defined(__mips__) + # define __NR_sched_setaffinity TODO (32/64/nabi) */ +# else +# warning "don't know the syscall number for sched_setaffinity on this architecture, will not support binding" +# define sched_setaffinity(pid, lg, mask) (errno = ENOSYS, -1) +# endif +# endif +# ifndef sched_setaffinity + _syscall3(int, sched_setaffinity, pid_t, pid, unsigned int, lg, const void *, mask) +# endif +# ifndef __NR_sched_getaffinity +# ifdef __i386__ +# define __NR_sched_getaffinity 242 +# elif defined(__x86_64__) +# define __NR_sched_getaffinity 204 +# elif defined(__ia64__) +# define __NR_sched_getaffinity 1232 +# elif defined(__hppa__) +# define __NR_sched_getaffinity 212 +# elif defined(__alpha__) +# define __NR_sched_getaffinity 396 +# elif defined(__s390__) +# define __NR_sched_getaffinity 240 +# elif defined(__sparc__) +# define __NR_sched_getaffinity 260 +# elif defined(__m68k__) +# define __NR_sched_getaffinity 312 +# elif defined(__powerpc__) || defined(__ppc__) || defined(__PPC__) || defined(__powerpc64__) || defined(__ppc64__) +# define __NR_sched_getaffinity 223 +# elif defined(__arm__) +# define __NR_sched_getaffinity 242 +# elif defined(__cris__) +# define __NR_sched_getaffinity 242 +/*# elif defined(__mips__) + # define __NR_sched_getaffinity TODO (32/64/nabi) */ +# else +# warning "don't know the syscall number for sched_getaffinity on this architecture, will not support getting binding" +# define sched_getaffinity(pid, lg, mask) (errno = ENOSYS, -1) +# endif +# endif +# ifndef sched_getaffinity + _syscall3(int, sched_getaffinity, pid_t, pid, unsigned int, lg, void *, mask) +# endif +#endif + +/* Added for ntohl() */ +#include + +#ifdef HAVE_OPENAT +/* Use our own filesystem functions if we have openat */ + +static const char * +hwloc_checkat(const char *path, int fsroot_fd) +{ + const char *relative_path; + if (fsroot_fd < 0) { + errno = EBADF; + return NULL; + } + + /* Skip leading slashes. */ + for (relative_path = path; *relative_path == '/'; relative_path++); + + return relative_path; +} + +static int +hwloc_openat(const char *path, int fsroot_fd) +{ + const char *relative_path; + + relative_path = hwloc_checkat(path, fsroot_fd); + if (!relative_path) + return -1; + + return openat (fsroot_fd, relative_path, O_RDONLY); +} + +static FILE * +hwloc_fopenat(const char *path, const char *mode, int fsroot_fd) +{ + int fd; + + if (strcmp(mode, "r")) { + errno = ENOTSUP; + return NULL; + } + + fd = hwloc_openat (path, fsroot_fd); + if (fd == -1) + return NULL; + + return fdopen(fd, mode); +} + +static int +hwloc_accessat(const char *path, int mode, int fsroot_fd) +{ + const char *relative_path; + + relative_path = hwloc_checkat(path, fsroot_fd); + if (!relative_path) + return -1; + + return faccessat(fsroot_fd, relative_path, mode, 0); +} + +static int +hwloc_fstatat(const char *path, struct stat *st, int flags, int fsroot_fd) +{ + const char *relative_path; + + relative_path = hwloc_checkat(path, fsroot_fd); + if (!relative_path) + return -1; + + return fstatat(fsroot_fd, relative_path, st, flags); +} + +static DIR* +hwloc_opendirat(const char *path, int fsroot_fd) +{ + int dir_fd; + const char *relative_path; + + relative_path = hwloc_checkat(path, fsroot_fd); + if (!relative_path) + return NULL; + + dir_fd = openat(fsroot_fd, relative_path, O_RDONLY | O_DIRECTORY); + if (dir_fd < 0) + return NULL; + + return fdopendir(dir_fd); +} + +#endif /* HAVE_OPENAT */ + +/* Static inline version of fopen so that we can use openat if we have + it, but still preserve compiler parameter checking */ +static __hwloc_inline int +hwloc_open(const char *p, int d __hwloc_attribute_unused) +{ +#ifdef HAVE_OPENAT + return hwloc_openat(p, d); +#else + return open(p, O_RDONLY); +#endif +} + +static __hwloc_inline FILE * +hwloc_fopen(const char *p, const char *m, int d __hwloc_attribute_unused) +{ +#ifdef HAVE_OPENAT + return hwloc_fopenat(p, m, d); +#else + return fopen(p, m); +#endif +} + +/* Static inline version of access so that we can use openat if we have + it, but still preserve compiler parameter checking */ +static __hwloc_inline int +hwloc_access(const char *p, int m, int d __hwloc_attribute_unused) +{ +#ifdef HAVE_OPENAT + return hwloc_accessat(p, m, d); +#else + return access(p, m); +#endif +} + +static __hwloc_inline int +hwloc_stat(const char *p, struct stat *st, int d __hwloc_attribute_unused) +{ +#ifdef HAVE_OPENAT + return hwloc_fstatat(p, st, 0, d); +#else + return stat(p, st); +#endif +} + +static __hwloc_inline int +hwloc_lstat(const char *p, struct stat *st, int d __hwloc_attribute_unused) +{ +#ifdef HAVE_OPENAT + return hwloc_fstatat(p, st, AT_SYMLINK_NOFOLLOW, d); +#else + return lstat(p, st); +#endif +} + +/* Static inline version of opendir so that we can use openat if we have + it, but still preserve compiler parameter checking */ +static __hwloc_inline DIR * +hwloc_opendir(const char *p, int d __hwloc_attribute_unused) +{ +#ifdef HAVE_OPENAT + return hwloc_opendirat(p, d); +#else + return opendir(p); +#endif +} + + +/***************************** + ******* CpuBind Hooks ******* + *****************************/ + +int +hwloc_linux_set_tid_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, pid_t tid __hwloc_attribute_unused, hwloc_const_bitmap_t hwloc_set __hwloc_attribute_unused) +{ + /* TODO Kerrighed: Use + * int migrate (pid_t pid, int destination_node); + * int migrate_self (int destination_node); + * int thread_migrate (int thread_id, int destination_node); + */ + + /* The resulting binding is always strict */ + +#if defined(HWLOC_HAVE_CPU_SET_S) && !defined(HWLOC_HAVE_OLD_SCHED_SETAFFINITY) + cpu_set_t *plinux_set; + unsigned cpu; + int last; + size_t setsize; + int err; + + last = hwloc_bitmap_last(hwloc_set); + if (last == -1) { + errno = EINVAL; + return -1; + } + + setsize = CPU_ALLOC_SIZE(last+1); + plinux_set = CPU_ALLOC(last+1); + + CPU_ZERO_S(setsize, plinux_set); + hwloc_bitmap_foreach_begin(cpu, hwloc_set) + CPU_SET_S(cpu, setsize, plinux_set); + hwloc_bitmap_foreach_end(); + + err = sched_setaffinity(tid, setsize, plinux_set); + + CPU_FREE(plinux_set); + return err; +#elif defined(HWLOC_HAVE_CPU_SET) + cpu_set_t linux_set; + unsigned cpu; + + CPU_ZERO(&linux_set); + hwloc_bitmap_foreach_begin(cpu, hwloc_set) + CPU_SET(cpu, &linux_set); + hwloc_bitmap_foreach_end(); + +#ifdef HWLOC_HAVE_OLD_SCHED_SETAFFINITY + return sched_setaffinity(tid, &linux_set); +#else /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ + return sched_setaffinity(tid, sizeof(linux_set), &linux_set); +#endif /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ +#elif defined(HWLOC_HAVE__SYSCALL3) + unsigned long mask = hwloc_bitmap_to_ulong(hwloc_set); + +#ifdef HWLOC_HAVE_OLD_SCHED_SETAFFINITY + return sched_setaffinity(tid, (void*) &mask); +#else /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ + return sched_setaffinity(tid, sizeof(mask), (void*) &mask); +#endif /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ +#else /* !_SYSCALL3 */ + errno = ENOSYS; + return -1; +#endif /* !_SYSCALL3 */ +} + +#if defined(HWLOC_HAVE_CPU_SET_S) && !defined(HWLOC_HAVE_OLD_SCHED_SETAFFINITY) +static int +hwloc_linux_parse_cpuset_file(FILE *file, hwloc_bitmap_t set) +{ + unsigned long start, stop; + + /* reset to zero first */ + hwloc_bitmap_zero(set); + + while (fscanf(file, "%lu", &start) == 1) + { + int c = fgetc(file); + + stop = start; + + if (c == '-') { + /* Range */ + if (fscanf(file, "%lu", &stop) != 1) { + /* Expected a number here */ + errno = EINVAL; + return -1; + } + c = fgetc(file); + } + + if (c == EOF || c == '\n') { + hwloc_bitmap_set_range(set, start, stop); + break; + } + + if (c != ',') { + /* Expected EOF, EOL, or a comma */ + errno = EINVAL; + return -1; + } + + hwloc_bitmap_set_range(set, start, stop); + } + + return 0; +} + +/* + * On some kernels, sched_getaffinity requires the output size to be larger + * than the kernel cpu_set size (defined by CONFIG_NR_CPUS). + * Try sched_affinity on ourself until we find a nr_cpus value that makes + * the kernel happy. + */ +static int +hwloc_linux_find_kernel_nr_cpus(hwloc_topology_t topology) +{ + static int _nr_cpus = -1; + int nr_cpus = _nr_cpus; + FILE *possible; + + if (nr_cpus != -1) + /* already computed */ + return nr_cpus; + + if (topology->levels[0][0]->complete_cpuset) + /* start with a nr_cpus that may contain the whole topology */ + nr_cpus = hwloc_bitmap_last(topology->levels[0][0]->complete_cpuset) + 1; + if (nr_cpus <= 0) + /* start from scratch, the topology isn't ready yet (complete_cpuset is missing (-1) or empty (0))*/ + nr_cpus = 1; + + possible = fopen("/sys/devices/system/cpu/possible", "r"); + if (possible) { + hwloc_bitmap_t possible_bitmap = hwloc_bitmap_alloc(); + if (hwloc_linux_parse_cpuset_file(possible, possible_bitmap) == 0) { + int max_possible = hwloc_bitmap_last(possible_bitmap); + + hwloc_debug_bitmap("possible CPUs are %s\n", possible_bitmap); + + if (nr_cpus < max_possible + 1) + nr_cpus = max_possible + 1; + } + fclose(possible); + hwloc_bitmap_free(possible_bitmap); + } + + while (1) { + cpu_set_t *set = CPU_ALLOC(nr_cpus); + size_t setsize = CPU_ALLOC_SIZE(nr_cpus); + int err = sched_getaffinity(0, setsize, set); /* always works, unless setsize is too small */ + CPU_FREE(set); + nr_cpus = setsize * 8; /* that's the value that was actually tested */ + if (!err) + /* found it */ + return _nr_cpus = nr_cpus; + nr_cpus *= 2; + } +} +#endif + +int +hwloc_linux_get_tid_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, pid_t tid __hwloc_attribute_unused, hwloc_bitmap_t hwloc_set __hwloc_attribute_unused) +{ + int err __hwloc_attribute_unused; + /* TODO Kerrighed */ + +#if defined(HWLOC_HAVE_CPU_SET_S) && !defined(HWLOC_HAVE_OLD_SCHED_SETAFFINITY) + cpu_set_t *plinux_set; + unsigned cpu; + int last; + size_t setsize; + int kernel_nr_cpus; + + /* find the kernel nr_cpus so as to use a large enough cpu_set size */ + kernel_nr_cpus = hwloc_linux_find_kernel_nr_cpus(topology); + setsize = CPU_ALLOC_SIZE(kernel_nr_cpus); + plinux_set = CPU_ALLOC(kernel_nr_cpus); + + err = sched_getaffinity(tid, setsize, plinux_set); + + if (err < 0) { + CPU_FREE(plinux_set); + return -1; + } + + last = -1; + if (topology->levels[0][0]->complete_cpuset) + last = hwloc_bitmap_last(topology->levels[0][0]->complete_cpuset); + if (last == -1) + /* round the maximal support number, the topology isn't ready yet (complete_cpuset is missing or empty)*/ + last = kernel_nr_cpus-1; + + hwloc_bitmap_zero(hwloc_set); + for(cpu=0; cpu<=(unsigned) last; cpu++) + if (CPU_ISSET_S(cpu, setsize, plinux_set)) + hwloc_bitmap_set(hwloc_set, cpu); + + CPU_FREE(plinux_set); +#elif defined(HWLOC_HAVE_CPU_SET) + cpu_set_t linux_set; + unsigned cpu; + +#ifdef HWLOC_HAVE_OLD_SCHED_SETAFFINITY + err = sched_getaffinity(tid, &linux_set); +#else /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ + err = sched_getaffinity(tid, sizeof(linux_set), &linux_set); +#endif /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ + if (err < 0) + return -1; + + hwloc_bitmap_zero(hwloc_set); + for(cpu=0; cpud_name, ".") || !strcmp(dirent->d_name, "..")) + continue; + tids[nr_tids++] = atoi(dirent->d_name); + } + + *nr_tidsp = nr_tids; + *tidsp = tids; + return 0; +} + +/* Per-tid callbacks */ +typedef int (*hwloc_linux_foreach_proc_tid_cb_t)(hwloc_topology_t topology, pid_t tid, void *data, int idx); + +static int +hwloc_linux_foreach_proc_tid(hwloc_topology_t topology, + pid_t pid, hwloc_linux_foreach_proc_tid_cb_t cb, + void *data) +{ + char taskdir_path[128]; + DIR *taskdir; + pid_t *tids, *newtids; + unsigned i, nr, newnr, failed = 0, failed_errno = 0; + unsigned retrynr = 0; + int err; + + if (pid) + snprintf(taskdir_path, sizeof(taskdir_path), "/proc/%u/task", (unsigned) pid); + else + snprintf(taskdir_path, sizeof(taskdir_path), "/proc/self/task"); + + taskdir = opendir(taskdir_path); + if (!taskdir) { + if (errno == ENOENT) + errno = EINVAL; + err = -1; + goto out; + } + + /* read the current list of threads */ + err = hwloc_linux_get_proc_tids(taskdir, &nr, &tids); + if (err < 0) + goto out_with_dir; + + retry: + /* apply the callback to all threads */ + failed=0; + for(i=0; i 10) { + /* we tried 10 times, it didn't work, the application is probably creating/destroying many threads, stop trying */ + errno = EAGAIN; + err = -1; + goto out_with_tids; + } + goto retry; + } else { + free(newtids); + } + + /* if all threads failed, return the last errno. */ + if (failed) { + err = -1; + errno = failed_errno; + goto out_with_tids; + } + + err = 0; + out_with_tids: + free(tids); + out_with_dir: + closedir(taskdir); + out: + return err; +} + +/* Per-tid proc_set_cpubind callback and caller. + * Callback data is a hwloc_bitmap_t. */ +static int +hwloc_linux_foreach_proc_tid_set_cpubind_cb(hwloc_topology_t topology, pid_t tid, void *data, int idx __hwloc_attribute_unused) +{ + return hwloc_linux_set_tid_cpubind(topology, tid, (hwloc_bitmap_t) data); +} + +static int +hwloc_linux_set_pid_cpubind(hwloc_topology_t topology, pid_t pid, hwloc_const_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) +{ + return hwloc_linux_foreach_proc_tid(topology, pid, + hwloc_linux_foreach_proc_tid_set_cpubind_cb, + (void*) hwloc_set); +} + +/* Per-tid proc_get_cpubind callback data, callback function and caller */ +struct hwloc_linux_foreach_proc_tid_get_cpubind_cb_data_s { + hwloc_bitmap_t cpuset; + hwloc_bitmap_t tidset; + int flags; +}; + +static int +hwloc_linux_foreach_proc_tid_get_cpubind_cb(hwloc_topology_t topology, pid_t tid, void *_data, int idx) +{ + struct hwloc_linux_foreach_proc_tid_get_cpubind_cb_data_s *data = _data; + hwloc_bitmap_t cpuset = data->cpuset; + hwloc_bitmap_t tidset = data->tidset; + int flags = data->flags; + + if (hwloc_linux_get_tid_cpubind(topology, tid, tidset)) + return -1; + + /* reset the cpuset on first iteration */ + if (!idx) + hwloc_bitmap_zero(cpuset); + + if (flags & HWLOC_CPUBIND_STRICT) { + /* if STRICT, we want all threads to have the same binding */ + if (!idx) { + /* this is the first thread, copy its binding */ + hwloc_bitmap_copy(cpuset, tidset); + } else if (!hwloc_bitmap_isequal(cpuset, tidset)) { + /* this is not the first thread, and it's binding is different */ + errno = EXDEV; + return -1; + } + } else { + /* if not STRICT, just OR all thread bindings */ + hwloc_bitmap_or(cpuset, cpuset, tidset); + } + return 0; +} + +static int +hwloc_linux_get_pid_cpubind(hwloc_topology_t topology, pid_t pid, hwloc_bitmap_t hwloc_set, int flags) +{ + struct hwloc_linux_foreach_proc_tid_get_cpubind_cb_data_s data; + hwloc_bitmap_t tidset = hwloc_bitmap_alloc(); + int ret; + + data.cpuset = hwloc_set; + data.tidset = tidset; + data.flags = flags; + ret = hwloc_linux_foreach_proc_tid(topology, pid, + hwloc_linux_foreach_proc_tid_get_cpubind_cb, + (void*) &data); + hwloc_bitmap_free(tidset); + return ret; +} + +static int +hwloc_linux_set_proc_cpubind(hwloc_topology_t topology, pid_t pid, hwloc_const_bitmap_t hwloc_set, int flags) +{ + if (pid == 0) + pid = topology->pid; + if (flags & HWLOC_CPUBIND_THREAD) + return hwloc_linux_set_tid_cpubind(topology, pid, hwloc_set); + else + return hwloc_linux_set_pid_cpubind(topology, pid, hwloc_set, flags); +} + +static int +hwloc_linux_get_proc_cpubind(hwloc_topology_t topology, pid_t pid, hwloc_bitmap_t hwloc_set, int flags) +{ + if (pid == 0) + pid = topology->pid; + if (flags & HWLOC_CPUBIND_THREAD) + return hwloc_linux_get_tid_cpubind(topology, pid, hwloc_set); + else + return hwloc_linux_get_pid_cpubind(topology, pid, hwloc_set, flags); +} + +static int +hwloc_linux_set_thisproc_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t hwloc_set, int flags) +{ + return hwloc_linux_set_pid_cpubind(topology, topology->pid, hwloc_set, flags); +} + +static int +hwloc_linux_get_thisproc_cpubind(hwloc_topology_t topology, hwloc_bitmap_t hwloc_set, int flags) +{ + return hwloc_linux_get_pid_cpubind(topology, topology->pid, hwloc_set, flags); +} + +static int +hwloc_linux_set_thisthread_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) +{ + if (topology->pid) { + errno = ENOSYS; + return -1; + } + return hwloc_linux_set_tid_cpubind(topology, 0, hwloc_set); +} + +static int +hwloc_linux_get_thisthread_cpubind(hwloc_topology_t topology, hwloc_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) +{ + if (topology->pid) { + errno = ENOSYS; + return -1; + } + return hwloc_linux_get_tid_cpubind(topology, 0, hwloc_set); +} + +#if HAVE_DECL_PTHREAD_SETAFFINITY_NP +#pragma weak pthread_setaffinity_np +#pragma weak pthread_self + +static int +hwloc_linux_set_thread_cpubind(hwloc_topology_t topology, pthread_t tid, hwloc_const_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) +{ + int err; + + if (topology->pid) { + errno = ENOSYS; + return -1; + } + + if (!pthread_self) { + /* ?! Application uses set_thread_cpubind, but doesn't link against libpthread ?! */ + errno = ENOSYS; + return -1; + } + if (tid == pthread_self()) + return hwloc_linux_set_tid_cpubind(topology, 0, hwloc_set); + + if (!pthread_setaffinity_np) { + errno = ENOSYS; + return -1; + } + /* TODO Kerrighed: Use + * int migrate (pid_t pid, int destination_node); + * int migrate_self (int destination_node); + * int thread_migrate (int thread_id, int destination_node); + */ + +#if defined(HWLOC_HAVE_CPU_SET_S) && !defined(HWLOC_HAVE_OLD_SCHED_SETAFFINITY) + /* Use a separate block so that we can define specific variable + types here */ + { + cpu_set_t *plinux_set; + unsigned cpu; + int last; + size_t setsize; + + last = hwloc_bitmap_last(hwloc_set); + if (last == -1) { + errno = EINVAL; + return -1; + } + + setsize = CPU_ALLOC_SIZE(last+1); + plinux_set = CPU_ALLOC(last+1); + + CPU_ZERO_S(setsize, plinux_set); + hwloc_bitmap_foreach_begin(cpu, hwloc_set) + CPU_SET_S(cpu, setsize, plinux_set); + hwloc_bitmap_foreach_end(); + + err = pthread_setaffinity_np(tid, setsize, plinux_set); + + CPU_FREE(plinux_set); + } +#elif defined(HWLOC_HAVE_CPU_SET) + /* Use a separate block so that we can define specific variable + types here */ + { + cpu_set_t linux_set; + unsigned cpu; + + CPU_ZERO(&linux_set); + hwloc_bitmap_foreach_begin(cpu, hwloc_set) + CPU_SET(cpu, &linux_set); + hwloc_bitmap_foreach_end(); + +#ifdef HWLOC_HAVE_OLD_SCHED_SETAFFINITY + err = pthread_setaffinity_np(tid, &linux_set); +#else /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ + err = pthread_setaffinity_np(tid, sizeof(linux_set), &linux_set); +#endif /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ + } +#else /* CPU_SET */ + /* Use a separate block so that we can define specific variable + types here */ + { + unsigned long mask = hwloc_bitmap_to_ulong(hwloc_set); + +#ifdef HWLOC_HAVE_OLD_SCHED_SETAFFINITY + err = pthread_setaffinity_np(tid, (void*) &mask); +#else /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ + err = pthread_setaffinity_np(tid, sizeof(mask), (void*) &mask); +#endif /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ + } +#endif /* CPU_SET */ + + if (err) { + errno = err; + return -1; + } + return 0; +} +#endif /* HAVE_DECL_PTHREAD_SETAFFINITY_NP */ + +#if HAVE_DECL_PTHREAD_GETAFFINITY_NP +#pragma weak pthread_getaffinity_np +#pragma weak pthread_self + +static int +hwloc_linux_get_thread_cpubind(hwloc_topology_t topology, pthread_t tid, hwloc_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) +{ + int err; + + if (topology->pid) { + errno = ENOSYS; + return -1; + } + + if (!pthread_self) { + /* ?! Application uses set_thread_cpubind, but doesn't link against libpthread ?! */ + errno = ENOSYS; + return -1; + } + if (tid == pthread_self()) + return hwloc_linux_get_tid_cpubind(topology, 0, hwloc_set); + + if (!pthread_getaffinity_np) { + errno = ENOSYS; + return -1; + } + /* TODO Kerrighed */ + +#if defined(HWLOC_HAVE_CPU_SET_S) && !defined(HWLOC_HAVE_OLD_SCHED_SETAFFINITY) + /* Use a separate block so that we can define specific variable + types here */ + { + cpu_set_t *plinux_set; + unsigned cpu; + int last; + size_t setsize; + + last = hwloc_bitmap_last(topology->levels[0][0]->complete_cpuset); + assert (last != -1); + + setsize = CPU_ALLOC_SIZE(last+1); + plinux_set = CPU_ALLOC(last+1); + + err = pthread_getaffinity_np(tid, setsize, plinux_set); + if (err) { + CPU_FREE(plinux_set); + errno = err; + return -1; + } + + hwloc_bitmap_zero(hwloc_set); + for(cpu=0; cpu<=(unsigned) last; cpu++) + if (CPU_ISSET_S(cpu, setsize, plinux_set)) + hwloc_bitmap_set(hwloc_set, cpu); + + CPU_FREE(plinux_set); + } +#elif defined(HWLOC_HAVE_CPU_SET) + /* Use a separate block so that we can define specific variable + types here */ + { + cpu_set_t linux_set; + unsigned cpu; + +#ifdef HWLOC_HAVE_OLD_SCHED_SETAFFINITY + err = pthread_getaffinity_np(tid, &linux_set); +#else /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ + err = pthread_getaffinity_np(tid, sizeof(linux_set), &linux_set); +#endif /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ + if (err) { + errno = err; + return -1; + } + + hwloc_bitmap_zero(hwloc_set); + for(cpu=0; cpucpuset; + hwloc_bitmap_t tidset = data->tidset; + + if (hwloc_linux_get_tid_last_cpu_location(topology, tid, tidset)) + return -1; + + /* reset the cpuset on first iteration */ + if (!idx) + hwloc_bitmap_zero(cpuset); + + hwloc_bitmap_or(cpuset, cpuset, tidset); + return 0; +} + +static int +hwloc_linux_get_pid_last_cpu_location(hwloc_topology_t topology, pid_t pid, hwloc_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) +{ + struct hwloc_linux_foreach_proc_tid_get_last_cpu_location_cb_data_s data; + hwloc_bitmap_t tidset = hwloc_bitmap_alloc(); + int ret; + + data.cpuset = hwloc_set; + data.tidset = tidset; + ret = hwloc_linux_foreach_proc_tid(topology, pid, + hwloc_linux_foreach_proc_tid_get_last_cpu_location_cb, + &data); + hwloc_bitmap_free(tidset); + return ret; +} + +static int +hwloc_linux_get_proc_last_cpu_location(hwloc_topology_t topology, pid_t pid, hwloc_bitmap_t hwloc_set, int flags) +{ + if (pid == 0) + pid = topology->pid; + if (flags & HWLOC_CPUBIND_THREAD) + return hwloc_linux_get_tid_last_cpu_location(topology, pid, hwloc_set); + else + return hwloc_linux_get_pid_last_cpu_location(topology, pid, hwloc_set, flags); +} + +static int +hwloc_linux_get_thisproc_last_cpu_location(hwloc_topology_t topology, hwloc_bitmap_t hwloc_set, int flags) +{ + return hwloc_linux_get_pid_last_cpu_location(topology, topology->pid, hwloc_set, flags); +} + +static int +hwloc_linux_get_thisthread_last_cpu_location(hwloc_topology_t topology, hwloc_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) +{ + if (topology->pid) { + errno = ENOSYS; + return -1; + } + return hwloc_linux_get_tid_last_cpu_location(topology, 0, hwloc_set); +} + + + +/*************************** + ****** Membind hooks ****** + ***************************/ + +#if defined HWLOC_HAVE_SET_MEMPOLICY || defined HWLOC_HAVE_MBIND +static int +hwloc_linux_membind_policy_from_hwloc(int *linuxpolicy, hwloc_membind_policy_t policy, int flags) +{ + switch (policy) { + case HWLOC_MEMBIND_DEFAULT: + case HWLOC_MEMBIND_FIRSTTOUCH: + *linuxpolicy = MPOL_DEFAULT; + break; + case HWLOC_MEMBIND_BIND: + if (flags & HWLOC_MEMBIND_STRICT) + *linuxpolicy = MPOL_BIND; + else + *linuxpolicy = MPOL_PREFERRED; + break; + case HWLOC_MEMBIND_INTERLEAVE: + *linuxpolicy = MPOL_INTERLEAVE; + break; + /* TODO: next-touch when (if?) patch applied upstream */ + default: + errno = ENOSYS; + return -1; + } + return 0; +} + +static int +hwloc_linux_membind_mask_from_nodeset(hwloc_topology_t topology __hwloc_attribute_unused, + hwloc_const_nodeset_t nodeset, + unsigned *max_os_index_p, unsigned long **linuxmaskp) +{ + unsigned max_os_index = 0; /* highest os_index + 1 */ + unsigned long *linuxmask; + unsigned i; + hwloc_nodeset_t linux_nodeset = NULL; + + if (hwloc_bitmap_isfull(nodeset)) { + linux_nodeset = hwloc_bitmap_alloc(); + hwloc_bitmap_only(linux_nodeset, 0); + nodeset = linux_nodeset; + } + + max_os_index = hwloc_bitmap_last(nodeset); + if (max_os_index == (unsigned) -1) + max_os_index = 0; + /* add 1 to convert the last os_index into a max_os_index, + * and round up to the nearest multiple of BITS_PER_LONG */ + max_os_index = (max_os_index + 1 + HWLOC_BITS_PER_LONG - 1) & ~(HWLOC_BITS_PER_LONG - 1); + + linuxmask = calloc(max_os_index/HWLOC_BITS_PER_LONG, sizeof(long)); + if (!linuxmask) { + hwloc_bitmap_free(linux_nodeset); + errno = ENOMEM; + return -1; + } + + for(i=0; iset_thisthread_cpubind = hwloc_linux_set_thisthread_cpubind; + hooks->get_thisthread_cpubind = hwloc_linux_get_thisthread_cpubind; + hooks->set_thisproc_cpubind = hwloc_linux_set_thisproc_cpubind; + hooks->get_thisproc_cpubind = hwloc_linux_get_thisproc_cpubind; + hooks->set_proc_cpubind = hwloc_linux_set_proc_cpubind; + hooks->get_proc_cpubind = hwloc_linux_get_proc_cpubind; +#if HAVE_DECL_PTHREAD_SETAFFINITY_NP + hooks->set_thread_cpubind = hwloc_linux_set_thread_cpubind; +#endif /* HAVE_DECL_PTHREAD_SETAFFINITY_NP */ +#if HAVE_DECL_PTHREAD_GETAFFINITY_NP + hooks->get_thread_cpubind = hwloc_linux_get_thread_cpubind; +#endif /* HAVE_DECL_PTHREAD_GETAFFINITY_NP */ + hooks->get_thisthread_last_cpu_location = hwloc_linux_get_thisthread_last_cpu_location; + hooks->get_thisproc_last_cpu_location = hwloc_linux_get_thisproc_last_cpu_location; + hooks->get_proc_last_cpu_location = hwloc_linux_get_proc_last_cpu_location; +#ifdef HWLOC_HAVE_SET_MEMPOLICY + hooks->set_thisthread_membind = hwloc_linux_set_thisthread_membind; + hooks->get_thisthread_membind = hwloc_linux_get_thisthread_membind; + hooks->get_area_membind = hwloc_linux_get_area_membind; +#endif /* HWLOC_HAVE_SET_MEMPOLICY */ +#ifdef HWLOC_HAVE_MBIND + hooks->set_area_membind = hwloc_linux_set_area_membind; + hooks->alloc_membind = hwloc_linux_alloc_membind; + hooks->alloc = hwloc_alloc_mmap; + hooks->free_membind = hwloc_free_mmap; + support->membind->firsttouch_membind = 1; + support->membind->bind_membind = 1; + support->membind->interleave_membind = 1; +#endif /* HWLOC_HAVE_MBIND */ +#if (defined HWLOC_HAVE_MIGRATE_PAGES) || ((defined HWLOC_HAVE_MBIND) && (defined MPOL_MF_MOVE)) + support->membind->migrate_membind = 1; +#endif +} + + + +/******************************************* + *** Misc Helpers for Topology Discovery *** + *******************************************/ + +/* cpuinfo array */ +struct hwloc_linux_cpuinfo_proc { + /* set during hwloc_linux_parse_cpuinfo */ + unsigned long Pproc; + /* set during hwloc_linux_parse_cpuinfo or -1 if unknown*/ + long Pcore, Psock; + /* set later, or -1 if unknown */ + long Lcore, Lsock; + /* set during hwloc_linux_parse_cpuinfo or NULL if unknown */ + char *cpumodel; +}; + +static int +hwloc_parse_sysfs_unsigned(const char *mappath, unsigned *value, int fsroot_fd) +{ + char string[11]; + FILE * fd; + + fd = hwloc_fopen(mappath, "r", fsroot_fd); + if (!fd) { + *value = -1; + return -1; + } + + if (!fgets(string, 11, fd)) { + *value = -1; + fclose(fd); + return -1; + } + *value = strtoul(string, NULL, 10); + + fclose(fd); + + return 0; +} + + +/* kernel cpumaps are composed of an array of 32bits cpumasks */ +#define KERNEL_CPU_MASK_BITS 32 +#define KERNEL_CPU_MAP_LEN (KERNEL_CPU_MASK_BITS/4+2) + +int +hwloc_linux_parse_cpumap_file(FILE *file, hwloc_bitmap_t set) +{ + unsigned long *maps; + unsigned long map; + int nr_maps = 0; + static int nr_maps_allocated = 8; /* only compute the power-of-two above the kernel cpumask size once */ + int i; + + maps = malloc(nr_maps_allocated * sizeof(*maps)); + + /* reset to zero first */ + hwloc_bitmap_zero(set); + + /* parse the whole mask */ + while (fscanf(file, "%lx,", &map) == 1) /* read one kernel cpu mask and the ending comma */ + { + if (nr_maps == nr_maps_allocated) { + nr_maps_allocated *= 2; + maps = realloc(maps, nr_maps_allocated * sizeof(*maps)); + } + + if (!map && !nr_maps) + /* ignore the first map if it's empty */ + continue; + + memmove(&maps[1], &maps[0], nr_maps*sizeof(*maps)); + maps[0] = map; + nr_maps++; + } + + /* convert into a set */ +#if KERNEL_CPU_MASK_BITS == HWLOC_BITS_PER_LONG + for(i=0; i. If cpuset are used they get /proc/pid/cpuset + * containing . + */ +static char * +hwloc_read_linux_cpuset_name(int fsroot_fd, hwloc_pid_t pid) +{ +#define CPUSET_NAME_LEN 128 + char cpuset_name[CPUSET_NAME_LEN]; + FILE *fd; + char *tmp; + + /* check whether a cgroup-cpuset is enabled */ + if (!pid) + fd = hwloc_fopen("/proc/self/cgroup", "r", fsroot_fd); + else { + char path[] = "/proc/XXXXXXXXXX/cgroup"; + snprintf(path, sizeof(path), "/proc/%d/cgroup", pid); + fd = hwloc_fopen(path, "r", fsroot_fd); + } + if (fd) { + /* find a cpuset line */ +#define CGROUP_LINE_LEN 256 + char line[CGROUP_LINE_LEN]; + while (fgets(line, sizeof(line), fd)) { + char *end, *colon = strchr(line, ':'); + if (!colon) + continue; + if (strncmp(colon, ":cpuset:", 8)) + continue; + + /* found a cgroup-cpuset line, return the name */ + fclose(fd); + end = strchr(colon, '\n'); + if (end) + *end = '\0'; + hwloc_debug("Found cgroup-cpuset %s\n", colon+8); + return strdup(colon+8); + } + fclose(fd); + } + + /* check whether a cpuset is enabled */ + if (!pid) + fd = hwloc_fopen("/proc/self/cpuset", "r", fsroot_fd); + else { + char path[] = "/proc/XXXXXXXXXX/cpuset"; + snprintf(path, sizeof(path), "/proc/%d/cpuset", pid); + fd = hwloc_fopen(path, "r", fsroot_fd); + } + if (!fd) { + /* found nothing */ + hwloc_debug("%s", "No cgroup or cpuset found\n"); + return NULL; + } + + /* found a cpuset, return the name */ + tmp = fgets(cpuset_name, sizeof(cpuset_name), fd); + fclose(fd); + if (!tmp) + return NULL; + tmp = strchr(cpuset_name, '\n'); + if (tmp) + *tmp = '\0'; + hwloc_debug("Found cpuset %s\n", cpuset_name); + return strdup(cpuset_name); +} + +/* + * Then, the cpuset description is available from either the cgroup or + * the cpuset filesystem (usually mounted in / or /dev) where there + * are cgroup/cpuset.{cpus,mems} or cpuset/{cpus,mems} files. + */ +static char * +hwloc_read_linux_cpuset_mask(const char *cgroup_mntpnt, const char *cpuset_mntpnt, const char *cpuset_name, const char *attr_name, int fsroot_fd) +{ +#define CPUSET_FILENAME_LEN 256 + char cpuset_filename[CPUSET_FILENAME_LEN]; + FILE *fd; + char *info = NULL, *tmp; + ssize_t ssize; + size_t size; + + if (cgroup_mntpnt) { + /* try to read the cpuset from cgroup */ + snprintf(cpuset_filename, CPUSET_FILENAME_LEN, "%s%s/cpuset.%s", cgroup_mntpnt, cpuset_name, attr_name); + hwloc_debug("Trying to read cgroup file <%s>\n", cpuset_filename); + fd = hwloc_fopen(cpuset_filename, "r", fsroot_fd); + if (fd) + goto gotfile; + } else if (cpuset_mntpnt) { + /* try to read the cpuset directly */ + snprintf(cpuset_filename, CPUSET_FILENAME_LEN, "%s%s/%s", cpuset_mntpnt, cpuset_name, attr_name); + hwloc_debug("Trying to read cpuset file <%s>\n", cpuset_filename); + fd = hwloc_fopen(cpuset_filename, "r", fsroot_fd); + if (fd) + goto gotfile; + } + + /* found no cpuset description, ignore it */ + hwloc_debug("Couldn't find cpuset <%s> description, ignoring\n", cpuset_name); + goto out; + +gotfile: + ssize = getline(&info, &size, fd); + fclose(fd); + if (ssize < 0) + goto out; + if (!info) + goto out; + + tmp = strchr(info, '\n'); + if (tmp) + *tmp = '\0'; + +out: + return info; +} + +static void +hwloc_admin_disable_set_from_cpuset(struct hwloc_linux_backend_data_s *data, + const char *cgroup_mntpnt, const char *cpuset_mntpnt, const char *cpuset_name, + const char *attr_name, + hwloc_bitmap_t admin_enabled_cpus_set) +{ + char *cpuset_mask; + char *current, *comma, *tmp; + int prevlast, nextfirst, nextlast; /* beginning/end of enabled-segments */ + hwloc_bitmap_t tmpset; + + cpuset_mask = hwloc_read_linux_cpuset_mask(cgroup_mntpnt, cpuset_mntpnt, cpuset_name, + attr_name, data->root_fd); + if (!cpuset_mask) + return; + + hwloc_debug("found cpuset %s: %s\n", attr_name, cpuset_mask); + + current = cpuset_mask; + prevlast = -1; + + while (1) { + /* save a pointer to the next comma and erase it to simplify things */ + comma = strchr(current, ','); + if (comma) + *comma = '\0'; + + /* find current enabled-segment bounds */ + nextfirst = strtoul(current, &tmp, 0); + if (*tmp == '-') + nextlast = strtoul(tmp+1, NULL, 0); + else + nextlast = nextfirst; + if (prevlast+1 <= nextfirst-1) { + hwloc_debug("%s [%d:%d] excluded by cpuset\n", attr_name, prevlast+1, nextfirst-1); + hwloc_bitmap_clr_range(admin_enabled_cpus_set, prevlast+1, nextfirst-1); + } + + /* switch to next enabled-segment */ + prevlast = nextlast; + if (!comma) + break; + current = comma+1; + } + + hwloc_debug("%s [%d:%d] excluded by cpuset\n", attr_name, prevlast+1, nextfirst-1); + /* no easy way to clear until the infinity */ + tmpset = hwloc_bitmap_alloc(); + hwloc_bitmap_set_range(tmpset, 0, prevlast); + hwloc_bitmap_and(admin_enabled_cpus_set, admin_enabled_cpus_set, tmpset); + hwloc_bitmap_free(tmpset); + + free(cpuset_mask); +} + +static void +hwloc_parse_meminfo_info(struct hwloc_linux_backend_data_s *data, + const char *path, + int prefixlength, + uint64_t *local_memory, + uint64_t *meminfo_hugepages_count, + uint64_t *meminfo_hugepages_size, + int onlytotal) +{ + char string[64]; + FILE *fd; + + fd = hwloc_fopen(path, "r", data->root_fd); + if (!fd) + return; + + while (fgets(string, sizeof(string), fd) && *string != '\0') + { + unsigned long long number; + if (strlen(string) < (size_t) prefixlength) + continue; + if (sscanf(string+prefixlength, "MemTotal: %llu kB", (unsigned long long *) &number) == 1) { + *local_memory = number << 10; + if (onlytotal) + break; + } + else if (!onlytotal) { + if (sscanf(string+prefixlength, "Hugepagesize: %llu", (unsigned long long *) &number) == 1) + *meminfo_hugepages_size = number << 10; + else if (sscanf(string+prefixlength, "HugePages_Free: %llu", (unsigned long long *) &number) == 1) + /* these are free hugepages, not the total amount of huge pages */ + *meminfo_hugepages_count = number; + } + } + + fclose(fd); +} + +#define SYSFS_NUMA_NODE_PATH_LEN 128 + +static void +hwloc_parse_hugepages_info(struct hwloc_linux_backend_data_s *data, + const char *dirpath, + struct hwloc_obj_memory_s *memory, + uint64_t *remaining_local_memory) +{ + DIR *dir; + struct dirent *dirent; + unsigned long index_ = 1; + FILE *hpfd; + char line[64]; + char path[SYSFS_NUMA_NODE_PATH_LEN]; + + dir = hwloc_opendir(dirpath, data->root_fd); + if (dir) { + while ((dirent = readdir(dir)) != NULL) { + if (strncmp(dirent->d_name, "hugepages-", 10)) + continue; + memory->page_types[index_].size = strtoul(dirent->d_name+10, NULL, 0) * 1024ULL; + sprintf(path, "%s/%s/nr_hugepages", dirpath, dirent->d_name); + hpfd = hwloc_fopen(path, "r", data->root_fd); + if (hpfd) { + if (fgets(line, sizeof(line), hpfd)) { + /* these are the actual total amount of huge pages */ + memory->page_types[index_].count = strtoull(line, NULL, 0); + *remaining_local_memory -= memory->page_types[index_].count * memory->page_types[index_].size; + index_++; + } + fclose(hpfd); + } + } + closedir(dir); + memory->page_types_len = index_; + } +} + +static void +hwloc_get_kerrighed_node_meminfo_info(struct hwloc_topology *topology, + struct hwloc_linux_backend_data_s *data, + unsigned long node, struct hwloc_obj_memory_s *memory) +{ + char path[128]; + uint64_t meminfo_hugepages_count, meminfo_hugepages_size = 0; + + if (topology->is_thissystem) { + memory->page_types_len = 2; + memory->page_types = malloc(2*sizeof(*memory->page_types)); + memset(memory->page_types, 0, 2*sizeof(*memory->page_types)); + /* Try to get the hugepage size from sysconf in case we fail to get it from /proc/meminfo later */ +#ifdef HAVE__SC_LARGE_PAGESIZE + memory->page_types[1].size = sysconf(_SC_LARGE_PAGESIZE); +#endif + memory->page_types[0].size = hwloc_getpagesize(); + } + + snprintf(path, sizeof(path), "/proc/nodes/node%lu/meminfo", node); + hwloc_parse_meminfo_info(data, path, 0 /* no prefix */, + &memory->local_memory, + &meminfo_hugepages_count, &meminfo_hugepages_size, + memory->page_types == NULL); + + if (memory->page_types) { + uint64_t remaining_local_memory = memory->local_memory; + if (meminfo_hugepages_size) { + memory->page_types[1].size = meminfo_hugepages_size; + memory->page_types[1].count = meminfo_hugepages_count; + remaining_local_memory -= meminfo_hugepages_count * meminfo_hugepages_size; + } else { + memory->page_types_len = 1; + } + memory->page_types[0].count = remaining_local_memory / memory->page_types[0].size; + } +} + +static void +hwloc_get_procfs_meminfo_info(struct hwloc_topology *topology, + struct hwloc_linux_backend_data_s *data, + struct hwloc_obj_memory_s *memory) +{ + uint64_t meminfo_hugepages_count, meminfo_hugepages_size = 0; + struct stat st; + int has_sysfs_hugepages = 0; + char *pagesize_env = getenv("HWLOC_DEBUG_PAGESIZE"); + int types = 2; + int err; + + err = hwloc_stat("/sys/kernel/mm/hugepages", &st, data->root_fd); + if (!err) { + types = 1 + st.st_nlink-2; + has_sysfs_hugepages = 1; + } + + if (topology->is_thissystem || pagesize_env) { + /* we cannot report any page_type info unless we have the page size. + * we'll take it either from the system if local, or from the debug env variable + */ + memory->page_types_len = types; + memory->page_types = calloc(types, sizeof(*memory->page_types)); + } + + if (topology->is_thissystem) { + /* Get the page and hugepage sizes from sysconf */ +#ifdef HAVE__SC_LARGE_PAGESIZE + memory->page_types[1].size = sysconf(_SC_LARGE_PAGESIZE); +#endif + memory->page_types[0].size = hwloc_getpagesize(); /* might be overwritten later by /proc/meminfo or sysfs */ + } + + hwloc_parse_meminfo_info(data, "/proc/meminfo", 0 /* no prefix */, + &memory->local_memory, + &meminfo_hugepages_count, &meminfo_hugepages_size, + memory->page_types == NULL); + + if (memory->page_types) { + uint64_t remaining_local_memory = memory->local_memory; + if (has_sysfs_hugepages) { + /* read from node%d/hugepages/hugepages-%skB/nr_hugepages */ + hwloc_parse_hugepages_info(data, "/sys/kernel/mm/hugepages", memory, &remaining_local_memory); + } else { + /* use what we found in meminfo */ + if (meminfo_hugepages_size) { + memory->page_types[1].size = meminfo_hugepages_size; + memory->page_types[1].count = meminfo_hugepages_count; + remaining_local_memory -= meminfo_hugepages_count * meminfo_hugepages_size; + } else { + memory->page_types_len = 1; + } + } + + if (pagesize_env) { + /* We cannot get the pagesize if not thissystem, use the env-given one to experience the code during make check */ + memory->page_types[0].size = strtoull(pagesize_env, NULL, 10); + /* If failed, use 4kB */ + if (!memory->page_types[0].size) + memory->page_types[0].size = 4096; + } + assert(memory->page_types[0].size); /* from sysconf if local or from the env */ + /* memory->page_types[1].size from sysconf if local, or from /proc/meminfo, or from sysfs, + * may be 0 if no hugepage support in the kernel */ + + memory->page_types[0].count = remaining_local_memory / memory->page_types[0].size; + } +} + +static void +hwloc_sysfs_node_meminfo_info(struct hwloc_topology *topology, + struct hwloc_linux_backend_data_s *data, + const char *syspath, int node, + struct hwloc_obj_memory_s *memory) +{ + char path[SYSFS_NUMA_NODE_PATH_LEN]; + char meminfopath[SYSFS_NUMA_NODE_PATH_LEN]; + uint64_t meminfo_hugepages_count = 0; + uint64_t meminfo_hugepages_size = 0; + struct stat st; + int has_sysfs_hugepages = 0; + int types = 2; + int err; + + sprintf(path, "%s/node%d/hugepages", syspath, node); + err = hwloc_stat(path, &st, data->root_fd); + if (!err) { + types = 1 + st.st_nlink-2; + has_sysfs_hugepages = 1; + } + + if (topology->is_thissystem) { + memory->page_types_len = types; + memory->page_types = malloc(types*sizeof(*memory->page_types)); + memset(memory->page_types, 0, types*sizeof(*memory->page_types)); + } + + sprintf(meminfopath, "%s/node%d/meminfo", syspath, node); + hwloc_parse_meminfo_info(data, meminfopath, + snprintf(NULL, 0, "Node %d ", node), + &memory->local_memory, + &meminfo_hugepages_count, NULL /* no hugepage size in node-specific meminfo */, + memory->page_types == NULL); + + if (memory->page_types) { + uint64_t remaining_local_memory = memory->local_memory; + if (has_sysfs_hugepages) { + /* read from node%d/hugepages/hugepages-%skB/nr_hugepages */ + hwloc_parse_hugepages_info(data, path, memory, &remaining_local_memory); + } else { + /* get hugepage size from machine-specific meminfo since there is no size in node-specific meminfo, + * hwloc_get_procfs_meminfo_info must have been called earlier */ + meminfo_hugepages_size = topology->levels[0][0]->memory.page_types[1].size; + /* use what we found in meminfo */ + if (meminfo_hugepages_size) { + memory->page_types[1].count = meminfo_hugepages_count; + memory->page_types[1].size = meminfo_hugepages_size; + remaining_local_memory -= meminfo_hugepages_count * meminfo_hugepages_size; + } else { + memory->page_types_len = 1; + } + } + /* update what's remaining as normal pages */ + memory->page_types[0].size = hwloc_getpagesize(); + memory->page_types[0].count = remaining_local_memory / memory->page_types[0].size; + } +} + +static void +hwloc_parse_node_distance(const char *distancepath, unsigned nbnodes, float *distances, int fsroot_fd) +{ + char string[4096]; /* enough for hundreds of nodes */ + char *tmp, *next; + FILE * fd; + + fd = hwloc_fopen(distancepath, "r", fsroot_fd); + if (!fd) + return; + + if (!fgets(string, sizeof(string), fd)) { + fclose(fd); + return; + } + + tmp = string; + while (tmp) { + unsigned distance = strtoul(tmp, &next, 0); + if (next == tmp) + break; + *distances = (float) distance; + distances++; + nbnodes--; + if (!nbnodes) + break; + tmp = next+1; + } + + fclose(fd); +} + +static void +hwloc__get_dmi_one_info(struct hwloc_linux_backend_data_s *data, + hwloc_obj_t obj, + char *path, unsigned pathlen, + const char *dmi_name, const char *hwloc_name) +{ + char dmi_line[64]; + char *tmp; + FILE *fd; + + strcpy(path+pathlen, dmi_name); + fd = hwloc_fopen(path, "r", data->root_fd); + if (!fd) + return; + + dmi_line[0] = '\0'; + tmp = fgets(dmi_line, sizeof(dmi_line), fd); + fclose (fd); + + if (tmp && dmi_line[0] != '\0') { + tmp = strchr(dmi_line, '\n'); + if (tmp) + *tmp = '\0'; + hwloc_debug("found %s '%s'\n", hwloc_name, dmi_line); + hwloc_obj_add_info(obj, hwloc_name, dmi_line); + } +} + +static void +hwloc__get_dmi_info(struct hwloc_linux_backend_data_s *data, hwloc_obj_t obj) +{ + char path[128]; + unsigned pathlen; + DIR *dir; + + strcpy(path, "/sys/devices/virtual/dmi/id"); + dir = hwloc_opendir(path, data->root_fd); + if (dir) { + pathlen = 27; + } else { + strcpy(path, "/sys/class/dmi/id"); + dir = hwloc_opendir(path, data->root_fd); + if (dir) + pathlen = 17; + else + return; + } + closedir(dir); + + path[pathlen++] = '/'; + + hwloc__get_dmi_one_info(data, obj, path, pathlen, "product_name", "DMIProductName"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "product_version", "DMIProductVersion"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "product_serial", "DMIProductSerial"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "product_uuid", "DMIProductUUID"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "board_vendor", "DMIBoardVendor"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "board_name", "DMIBoardName"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "board_version", "DMIBoardVersion"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "board_serial", "DMIBoardSerial"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "board_asset_tag", "DMIBoardAssetTag"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "chassis_vendor", "DMIChassisVendor"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "chassis_type", "DMIChassisType"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "chassis_version", "DMIChassisVersion"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "chassis_serial", "DMIChassisSerial"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "chassis_asset_tag", "DMIChassisAssetTag"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "bios_vendor", "DMIBIOSVendor"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "bios_version", "DMIBIOSVersion"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "bios_date", "DMIBIOSDate"); + hwloc__get_dmi_one_info(data, obj, path, pathlen, "sys_vendor", "DMISysVendor"); +} + + + +/*********************************** + ****** Device tree Discovery ****** + ***********************************/ + +/* Reads the entire file and returns bytes read if bytes_read != NULL + * Returned pointer can be freed by using free(). */ +static void * +hwloc_read_raw(const char *p, const char *p1, size_t *bytes_read, int root_fd) +{ + char *fname = NULL; + char *ret = NULL; + struct stat fs; + int file = -1; + unsigned len; + + len = strlen(p) + 1 + strlen(p1) + 1; + fname = malloc(len); + if (NULL == fname) { + return NULL; + } + snprintf(fname, len, "%s/%s", p, p1); + + file = hwloc_open(fname, root_fd); + if (-1 == file) { + goto out_no_close; + } + if (fstat(file, &fs)) { + goto out; + } + + ret = (char *) malloc(fs.st_size); + if (NULL != ret) { + ssize_t cb = read(file, ret, fs.st_size); + if (cb == -1) { + free(ret); + ret = NULL; + } else { + if (NULL != bytes_read) + *bytes_read = cb; + } + } + + out: + close(file); + out_no_close: + if (NULL != fname) { + free(fname); + } + return ret; +} + +/* Reads the entire file and returns it as a 0-terminated string + * Returned pointer can be freed by using free(). */ +static char * +hwloc_read_str(const char *p, const char *p1, int root_fd) +{ + size_t cb = 0; + char *ret = hwloc_read_raw(p, p1, &cb, root_fd); + if ((NULL != ret) && (0 < cb) && (0 != ret[cb-1])) { + ret = realloc(ret, cb + 1); + ret[cb] = 0; + } + return ret; +} + +/* Reads first 32bit bigendian value */ +static ssize_t +hwloc_read_unit32be(const char *p, const char *p1, uint32_t *buf, int root_fd) +{ + size_t cb = 0; + uint32_t *tmp = hwloc_read_raw(p, p1, &cb, root_fd); + if (sizeof(*buf) != cb) { + errno = EINVAL; + free(tmp); /* tmp is either NULL or contains useless things */ + return -1; + } + *buf = htonl(*tmp); + free(tmp); + return sizeof(*buf); +} + +typedef struct { + unsigned int n, allocated; + struct { + hwloc_bitmap_t cpuset; + uint32_t phandle; + uint32_t l2_cache; + char *name; + } *p; +} device_tree_cpus_t; + +static void +add_device_tree_cpus_node(device_tree_cpus_t *cpus, hwloc_bitmap_t cpuset, + uint32_t l2_cache, uint32_t phandle, const char *name) +{ + if (cpus->n == cpus->allocated) { + if (!cpus->allocated) + cpus->allocated = 64; + else + cpus->allocated *= 2; + cpus->p = realloc(cpus->p, cpus->allocated * sizeof(cpus->p[0])); + } + cpus->p[cpus->n].phandle = phandle; + cpus->p[cpus->n].cpuset = (NULL == cpuset)?NULL:hwloc_bitmap_dup(cpuset); + cpus->p[cpus->n].l2_cache = l2_cache; + cpus->p[cpus->n].name = strdup(name); + ++cpus->n; +} + +/* Walks over the cache list in order to detect nested caches and CPU mask for each */ +static int +look_powerpc_device_tree_discover_cache(device_tree_cpus_t *cpus, + uint32_t phandle, unsigned int *level, hwloc_bitmap_t cpuset) +{ + unsigned int i; + int ret = -1; + if ((NULL == level) || (NULL == cpuset) || phandle == (uint32_t) -1) + return ret; + for (i = 0; i < cpus->n; ++i) { + if (phandle != cpus->p[i].l2_cache) + continue; + if (NULL != cpus->p[i].cpuset) { + hwloc_bitmap_or(cpuset, cpuset, cpus->p[i].cpuset); + ret = 0; + } else { + ++(*level); + if (0 == look_powerpc_device_tree_discover_cache(cpus, + cpus->p[i].phandle, level, cpuset)) + ret = 0; + } + } + return ret; +} + +static void +try__add_cache_from_device_tree_cpu(struct hwloc_topology *topology, + unsigned int level, hwloc_obj_cache_type_t type, + uint32_t cache_line_size, uint32_t cache_size, uint32_t cache_sets, + hwloc_bitmap_t cpuset) +{ + struct hwloc_obj *c = NULL; + + if (0 == cache_size) + return; + + c = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); + c->attr->cache.depth = level; + c->attr->cache.linesize = cache_line_size; + c->attr->cache.size = cache_size; + c->attr->cache.type = type; + if (cache_sets == 1) + /* likely wrong, make it unknown */ + cache_sets = 0; + if (cache_sets && cache_line_size) + c->attr->cache.associativity = cache_size / (cache_sets * cache_line_size); + else + c->attr->cache.associativity = 0; + c->cpuset = hwloc_bitmap_dup(cpuset); + hwloc_debug_2args_bitmap("cache (%s) depth %d has cpuset %s\n", + type == HWLOC_OBJ_CACHE_UNIFIED ? "unified" : (type == HWLOC_OBJ_CACHE_DATA ? "data" : "instruction"), + level, c->cpuset); + hwloc_insert_object_by_cpuset(topology, c); +} + +static void +try_add_cache_from_device_tree_cpu(struct hwloc_topology *topology, + struct hwloc_linux_backend_data_s *data, + const char *cpu, unsigned int level, hwloc_bitmap_t cpuset) +{ + /* d-cache-block-size - ignore */ + /* d-cache-line-size - to read, in bytes */ + /* d-cache-sets - ignore */ + /* d-cache-size - to read, in bytes */ + /* i-cache, same for instruction */ + /* cache-unified only exist if data and instruction caches are unified */ + /* d-tlb-sets - ignore */ + /* d-tlb-size - ignore, always 0 on power6 */ + /* i-tlb-*, same */ + uint32_t d_cache_line_size = 0, d_cache_size = 0, d_cache_sets = 0; + uint32_t i_cache_line_size = 0, i_cache_size = 0, i_cache_sets = 0; + char unified_path[1024]; + struct stat statbuf; + int unified; + + snprintf(unified_path, sizeof(unified_path), "%s/cache-unified", cpu); + unified = (hwloc_stat(unified_path, &statbuf, data->root_fd) == 0); + + hwloc_read_unit32be(cpu, "d-cache-line-size", &d_cache_line_size, + data->root_fd); + hwloc_read_unit32be(cpu, "d-cache-size", &d_cache_size, + data->root_fd); + hwloc_read_unit32be(cpu, "d-cache-sets", &d_cache_sets, + data->root_fd); + hwloc_read_unit32be(cpu, "i-cache-line-size", &i_cache_line_size, + data->root_fd); + hwloc_read_unit32be(cpu, "i-cache-size", &i_cache_size, + data->root_fd); + hwloc_read_unit32be(cpu, "i-cache-sets", &i_cache_sets, + data->root_fd); + + if (!unified) + try__add_cache_from_device_tree_cpu(topology, level, HWLOC_OBJ_CACHE_INSTRUCTION, + i_cache_line_size, i_cache_size, i_cache_sets, cpuset); + try__add_cache_from_device_tree_cpu(topology, level, unified ? HWLOC_OBJ_CACHE_UNIFIED : HWLOC_OBJ_CACHE_DATA, + d_cache_line_size, d_cache_size, d_cache_sets, cpuset); +} + +/* + * Discovers L1/L2/L3 cache information on IBM PowerPC systems for old kernels (RHEL5.*) + * which provide NUMA nodes information without any details + */ +static void +look_powerpc_device_tree(struct hwloc_topology *topology, + struct hwloc_linux_backend_data_s *data) +{ + device_tree_cpus_t cpus; + const char ofroot[] = "/proc/device-tree/cpus"; + unsigned int i; + int root_fd = data->root_fd; + DIR *dt = hwloc_opendir(ofroot, root_fd); + struct dirent *dirent; + + cpus.n = 0; + cpus.p = NULL; + cpus.allocated = 0; + + if (NULL == dt) + return; + + while (NULL != (dirent = readdir(dt))) { + struct stat statbuf; + int err; + char *cpu; + char *device_type; + uint32_t reg = -1, l2_cache = -1, phandle = -1; + unsigned len; + + if ('.' == dirent->d_name[0]) + continue; + + len = sizeof(ofroot) + 1 + strlen(dirent->d_name) + 1; + cpu = malloc(len); + if (NULL == cpu) { + continue; + } + snprintf(cpu, len, "%s/%s", ofroot, dirent->d_name); + + err = hwloc_stat(cpu, &statbuf, root_fd); + if (err < 0 || !S_ISDIR(statbuf.st_mode)) + goto cont; + + device_type = hwloc_read_str(cpu, "device_type", root_fd); + if (NULL == device_type) + goto cont; + + hwloc_read_unit32be(cpu, "reg", ®, root_fd); + if (hwloc_read_unit32be(cpu, "next-level-cache", &l2_cache, root_fd) == -1) + hwloc_read_unit32be(cpu, "l2-cache", &l2_cache, root_fd); + if (hwloc_read_unit32be(cpu, "phandle", &phandle, root_fd) == -1) + if (hwloc_read_unit32be(cpu, "ibm,phandle", &phandle, root_fd) == -1) + hwloc_read_unit32be(cpu, "linux,phandle", &phandle, root_fd); + + if (0 == strcmp(device_type, "cache")) { + add_device_tree_cpus_node(&cpus, NULL, l2_cache, phandle, dirent->d_name); + } + else if (0 == strcmp(device_type, "cpu")) { + /* Found CPU */ + hwloc_bitmap_t cpuset = NULL; + size_t cb = 0; + uint32_t *threads = hwloc_read_raw(cpu, "ibm,ppc-interrupt-server#s", &cb, root_fd); + uint32_t nthreads = cb / sizeof(threads[0]); + + if (NULL != threads) { + cpuset = hwloc_bitmap_alloc(); + for (i = 0; i < nthreads; ++i) { + if (hwloc_bitmap_isset(topology->levels[0][0]->complete_cpuset, ntohl(threads[i]))) + hwloc_bitmap_set(cpuset, ntohl(threads[i])); + } + free(threads); + } else if ((unsigned int)-1 != reg) { + cpuset = hwloc_bitmap_alloc(); + hwloc_bitmap_set(cpuset, reg); + } + + if (NULL == cpuset) { + hwloc_debug("%s has no \"reg\" property, skipping\n", cpu); + } else { + struct hwloc_obj *core = NULL; + add_device_tree_cpus_node(&cpus, cpuset, l2_cache, phandle, dirent->d_name); + + /* Add core */ + core = hwloc_alloc_setup_object(HWLOC_OBJ_CORE, reg); + core->cpuset = hwloc_bitmap_dup(cpuset); + hwloc_insert_object_by_cpuset(topology, core); + + /* Add L1 cache */ + try_add_cache_from_device_tree_cpu(topology, data, cpu, 1, cpuset); + + hwloc_bitmap_free(cpuset); + } + } + free(device_type); +cont: + free(cpu); + } + closedir(dt); + + /* No cores and L2 cache were found, exiting */ + if (0 == cpus.n) { + hwloc_debug("No cores and L2 cache were found in %s, exiting\n", ofroot); + return; + } + +#ifdef HWLOC_DEBUG + for (i = 0; i < cpus.n; ++i) { + hwloc_debug("%i: %s ibm,phandle=%08X l2_cache=%08X ", + i, cpus.p[i].name, cpus.p[i].phandle, cpus.p[i].l2_cache); + if (NULL == cpus.p[i].cpuset) { + hwloc_debug("%s\n", "no cpuset"); + } else { + hwloc_debug_bitmap("cpuset %s\n", cpus.p[i].cpuset); + } + } +#endif + + /* Scan L2/L3/... caches */ + for (i = 0; i < cpus.n; ++i) { + unsigned int level = 2; + hwloc_bitmap_t cpuset; + /* Skip real CPUs */ + if (NULL != cpus.p[i].cpuset) + continue; + + /* Calculate cache level and CPU mask */ + cpuset = hwloc_bitmap_alloc(); + if (0 == look_powerpc_device_tree_discover_cache(&cpus, + cpus.p[i].phandle, &level, cpuset)) { + char *cpu; + unsigned len; + + len = sizeof(ofroot) + 1 + strlen(cpus.p[i].name) + 1; + cpu = malloc(len); + if (NULL == cpu) { + return; + } + snprintf(cpu, len, "%s/%s", ofroot, cpus.p[i].name); + + try_add_cache_from_device_tree_cpu(topology, data, cpu, level, cpuset); + free(cpu); + } + hwloc_bitmap_free(cpuset); + } + + /* Do cleanup */ + for (i = 0; i < cpus.n; ++i) { + hwloc_bitmap_free(cpus.p[i].cpuset); + free(cpus.p[i].name); + } + free(cpus.p); +} + + + +/************************************** + ****** Sysfs Topology Discovery ****** + **************************************/ + +static int +look_sysfsnode(struct hwloc_topology *topology, + struct hwloc_linux_backend_data_s *data, + const char *path, unsigned *found) +{ + unsigned osnode; + unsigned nbnodes = 0; + DIR *dir; + struct dirent *dirent; + hwloc_bitmap_t nodeset; + + *found = 0; + + /* Get the list of nodes first */ + dir = hwloc_opendir(path, data->root_fd); + if (dir) + { + nodeset = hwloc_bitmap_alloc(); + while ((dirent = readdir(dir)) != NULL) + { + if (strncmp(dirent->d_name, "node", 4)) + continue; + osnode = strtoul(dirent->d_name+4, NULL, 0); + hwloc_bitmap_set(nodeset, osnode); + nbnodes++; + } + closedir(dir); + } + else + return -1; + + if (nbnodes <= 1) + { + hwloc_bitmap_free(nodeset); + return 0; + } + + /* For convenience, put these declarations inside a block. */ + + { + hwloc_obj_t * nodes = calloc(nbnodes, sizeof(hwloc_obj_t)); + unsigned *indexes = calloc(nbnodes, sizeof(unsigned)); + float * distances; + int failednodes = 0; + unsigned index_; + + if (NULL == nodes || NULL == indexes) { + free(nodes); + free(indexes); + hwloc_bitmap_free(nodeset); + nbnodes = 0; + goto out; + } + + /* Unsparsify node indexes. + * We'll need them later because Linux groups sparse distances + * and keeps them in order in the sysfs distance files. + * It'll simplify things in the meantime. + */ + index_ = 0; + hwloc_bitmap_foreach_begin (osnode, nodeset) { + indexes[index_] = osnode; + index_++; + } hwloc_bitmap_foreach_end(); + hwloc_bitmap_free(nodeset); + +#ifdef HWLOC_DEBUG + hwloc_debug("%s", "NUMA indexes: "); + for (index_ = 0; index_ < nbnodes; index_++) { + hwloc_debug(" %u", indexes[index_]); + } + hwloc_debug("%s", "\n"); +#endif + + /* Create NUMA objects */ + for (index_ = 0; index_ < nbnodes; index_++) { + char nodepath[SYSFS_NUMA_NODE_PATH_LEN]; + hwloc_bitmap_t cpuset; + hwloc_obj_t node, res_obj; + + osnode = indexes[index_]; + + sprintf(nodepath, "%s/node%u/cpumap", path, osnode); + cpuset = hwloc_parse_cpumap(nodepath, data->root_fd); + if (!cpuset) { + /* This NUMA object won't be inserted, we'll ignore distances */ + failednodes++; + continue; + } + + node = hwloc_alloc_setup_object(HWLOC_OBJ_NODE, osnode); + node->cpuset = cpuset; + node->nodeset = hwloc_bitmap_alloc(); + hwloc_bitmap_set(node->nodeset, osnode); + + hwloc_sysfs_node_meminfo_info(topology, data, path, osnode, &node->memory); + + hwloc_debug_1arg_bitmap("os node %u has cpuset %s\n", + osnode, node->cpuset); + res_obj = hwloc_insert_object_by_cpuset(topology, node); + if (node == res_obj) { + nodes[index_] = node; + } else { + /* We got merged somehow, could be a buggy BIOS reporting wrong NUMA node cpuset. + * This object disappeared, we'll ignore distances */ + failednodes++; + } + } + + if (failednodes) { + /* failed to read/create some nodes, don't bother reading/fixing + * a distance matrix that would likely be wrong anyway. + */ + nbnodes -= failednodes; + distances = NULL; + } else { + distances = calloc(nbnodes*nbnodes, sizeof(float)); + } + + if (NULL == distances) { + free(nodes); + free(indexes); + goto out; + } + + /* Get actual distances now */ + for (index_ = 0; index_ < nbnodes; index_++) { + char nodepath[SYSFS_NUMA_NODE_PATH_LEN]; + + osnode = indexes[index_]; + + /* Linux nodeX/distance file contains distance from X to other localities (from ACPI SLIT table or so), + * store them in slots X*N...X*N+N-1 */ + sprintf(nodepath, "%s/node%u/distance", path, osnode); + hwloc_parse_node_distance(nodepath, nbnodes, distances+index_*nbnodes, data->root_fd); + } + + hwloc_distances_set(topology, HWLOC_OBJ_NODE, nbnodes, indexes, nodes, distances, 0 /* OS cannot force */); + } + + out: + *found = nbnodes; + return 0; +} + +/* Look at Linux' /sys/devices/system/cpu/cpu%d/topology/ */ +static int +look_sysfscpu(struct hwloc_topology *topology, + struct hwloc_linux_backend_data_s *data, + const char *path, + struct hwloc_linux_cpuinfo_proc * cpuinfo_Lprocs, unsigned cpuinfo_numprocs) +{ + hwloc_bitmap_t cpuset; /* Set of cpus for which we have topology information */ +#define CPU_TOPOLOGY_STR_LEN 128 + char str[CPU_TOPOLOGY_STR_LEN]; + DIR *dir; + int i,j; + FILE *fd; + unsigned caches_added; + + /* fill the cpuset of interesting cpus */ + dir = hwloc_opendir(path, data->root_fd); + if (!dir) + return -1; + else { + struct dirent *dirent; + cpuset = hwloc_bitmap_alloc(); + + while ((dirent = readdir(dir)) != NULL) { + unsigned long cpu; + char online[2]; + + if (strncmp(dirent->d_name, "cpu", 3)) + continue; + cpu = strtoul(dirent->d_name+3, NULL, 0); + + /* Maybe we don't have topology information but at least it exists */ + hwloc_bitmap_set(topology->levels[0][0]->complete_cpuset, cpu); + + /* check whether this processor is online */ + sprintf(str, "%s/cpu%lu/online", path, cpu); + fd = hwloc_fopen(str, "r", data->root_fd); + if (fd) { + if (fgets(online, sizeof(online), fd)) { + fclose(fd); + if (atoi(online)) { + hwloc_debug("os proc %lu is online\n", cpu); + } else { + hwloc_debug("os proc %lu is offline\n", cpu); + hwloc_bitmap_clr(topology->levels[0][0]->online_cpuset, cpu); + } + } else { + fclose(fd); + } + } + + /* check whether the kernel exports topology information for this cpu */ + sprintf(str, "%s/cpu%lu/topology", path, cpu); + if (hwloc_access(str, X_OK, data->root_fd) < 0 && errno == ENOENT) { + hwloc_debug("os proc %lu has no accessible %s/cpu%lu/topology\n", + cpu, path, cpu); + continue; + } + + hwloc_bitmap_set(cpuset, cpu); + } + closedir(dir); + } + + topology->support.discovery->pu = 1; + hwloc_debug_1arg_bitmap("found %d cpu topologies, cpuset %s\n", + hwloc_bitmap_weight(cpuset), cpuset); + + caches_added = 0; + hwloc_bitmap_foreach_begin(i, cpuset) + { + hwloc_bitmap_t socketset, coreset, bookset, threadset, savedcoreset; + unsigned mysocketid, mycoreid, mybookid; + int threadwithcoreid = 0; + + /* look at the socket */ + mysocketid = 0; /* shut-up the compiler */ + sprintf(str, "%s/cpu%d/topology/physical_package_id", path, i); + hwloc_parse_sysfs_unsigned(str, &mysocketid, data->root_fd); + + sprintf(str, "%s/cpu%d/topology/core_siblings", path, i); + socketset = hwloc_parse_cpumap(str, data->root_fd); + if (socketset && hwloc_bitmap_first(socketset) == i) { + /* first cpu in this socket, add the socket */ + struct hwloc_obj *sock = hwloc_alloc_setup_object(HWLOC_OBJ_SOCKET, mysocketid); + sock->cpuset = socketset; + hwloc_debug_1arg_bitmap("os socket %u has cpuset %s\n", + mysocketid, socketset); + /* add cpuinfo */ + if (cpuinfo_Lprocs) { + for(j=0; j<(int) cpuinfo_numprocs; j++) + if ((int) cpuinfo_Lprocs[j].Pproc == i + && cpuinfo_Lprocs[j].cpumodel) { + /* FIXME add to name as well? */ + hwloc_obj_add_info(sock, "CPUModel", cpuinfo_Lprocs[j].cpumodel); + } + } + hwloc_insert_object_by_cpuset(topology, sock); + socketset = NULL; /* don't free it */ + } + hwloc_bitmap_free(socketset); + + /* look at the core */ + mycoreid = 0; /* shut-up the compiler */ + sprintf(str, "%s/cpu%d/topology/core_id", path, i); + hwloc_parse_sysfs_unsigned(str, &mycoreid, data->root_fd); + + sprintf(str, "%s/cpu%d/topology/thread_siblings", path, i); + coreset = hwloc_parse_cpumap(str, data->root_fd); + savedcoreset = coreset; /* store it for later work-arounds */ + + if (coreset && hwloc_bitmap_weight(coreset) > 1) { + /* check if this is hyperthreading or different coreids */ + unsigned siblingid, siblingcoreid; + hwloc_bitmap_t set = hwloc_bitmap_dup(coreset); + hwloc_bitmap_clr(set, i); + siblingid = hwloc_bitmap_first(set); + siblingcoreid = mycoreid; + sprintf(str, "%s/cpu%d/topology/core_id", path, siblingid); + hwloc_parse_sysfs_unsigned(str, &siblingcoreid, data->root_fd); + threadwithcoreid = (siblingcoreid != mycoreid); + hwloc_bitmap_free(set); + } + + + if (coreset && (hwloc_bitmap_first(coreset) == i || threadwithcoreid)) { + /* regular core */ + struct hwloc_obj *core = hwloc_alloc_setup_object(HWLOC_OBJ_CORE, mycoreid); + if (threadwithcoreid) { + /* amd multicore compute-unit, create one core per thread */ + core->cpuset = hwloc_bitmap_alloc(); + hwloc_bitmap_set(core->cpuset, i); + } else { + core->cpuset = coreset; + } + hwloc_debug_1arg_bitmap("os core %u has cpuset %s\n", + mycoreid, coreset); + hwloc_insert_object_by_cpuset(topology, core); + coreset = NULL; /* don't free it */ + } + + /* look at the books */ + mybookid = 0; /* shut-up the compiler */ + sprintf(str, "%s/cpu%d/topology/book_id", path, i); + if (hwloc_parse_sysfs_unsigned(str, &mybookid, data->root_fd) == 0) { + + sprintf(str, "%s/cpu%d/topology/book_siblings", path, i); + bookset = hwloc_parse_cpumap(str, data->root_fd); + if (bookset && hwloc_bitmap_first(bookset) == i) { + struct hwloc_obj *book = hwloc_alloc_setup_object(HWLOC_OBJ_GROUP, mybookid); + book->cpuset = bookset; + hwloc_debug_1arg_bitmap("os book %u has cpuset %s\n", + mybookid, bookset); + hwloc_insert_object_by_cpuset(topology, book); + bookset = NULL; /* don't free it */ + } + } + + { + /* look at the thread */ + struct hwloc_obj *thread = hwloc_alloc_setup_object(HWLOC_OBJ_PU, i); + threadset = hwloc_bitmap_alloc(); + hwloc_bitmap_only(threadset, i); + thread->cpuset = threadset; + hwloc_debug_1arg_bitmap("thread %d has cpuset %s\n", + i, threadset); + hwloc_insert_object_by_cpuset(topology, thread); + } + + /* look at the caches */ + for(j=0; j<10; j++) { +#define SHARED_CPU_MAP_STRLEN 128 + char mappath[SHARED_CPU_MAP_STRLEN]; + char str2[20]; /* enough for a level number (one digit) or a type (Data/Instruction/Unified) */ + hwloc_bitmap_t cacheset; + unsigned long kB = 0; + unsigned linesize = 0; + unsigned sets = 0, lines_per_tag = 1; + int depth; /* 0 for L1, .... */ + hwloc_obj_cache_type_t type = HWLOC_OBJ_CACHE_UNIFIED; /* default */ + + /* get the cache level depth */ + sprintf(mappath, "%s/cpu%d/cache/index%d/level", path, i, j); + fd = hwloc_fopen(mappath, "r", data->root_fd); + if (fd) { + char *res = fgets(str2,sizeof(str2), fd); + fclose(fd); + if (res) + depth = strtoul(str2, NULL, 10)-1; + else + continue; + } else + continue; + + /* cache type */ + sprintf(mappath, "%s/cpu%d/cache/index%d/type", path, i, j); + fd = hwloc_fopen(mappath, "r", data->root_fd); + if (fd) { + if (fgets(str2, sizeof(str2), fd)) { + fclose(fd); + if (!strncmp(str2, "Data", 4)) + type = HWLOC_OBJ_CACHE_DATA; + else if (!strncmp(str2, "Unified", 7)) + type = HWLOC_OBJ_CACHE_UNIFIED; + else if (!strncmp(str2, "Instruction", 11)) + type = HWLOC_OBJ_CACHE_INSTRUCTION; + else + continue; + } else { + fclose(fd); + continue; + } + } else + continue; + + /* get the cache size */ + sprintf(mappath, "%s/cpu%d/cache/index%d/size", path, i, j); + fd = hwloc_fopen(mappath, "r", data->root_fd); + if (fd) { + if (fgets(str2,sizeof(str2), fd)) + kB = atol(str2); /* in kB */ + fclose(fd); + } + + /* get the line size */ + sprintf(mappath, "%s/cpu%d/cache/index%d/coherency_line_size", path, i, j); + fd = hwloc_fopen(mappath, "r", data->root_fd); + if (fd) { + if (fgets(str2,sizeof(str2), fd)) + linesize = atol(str2); /* in bytes */ + fclose(fd); + } + + /* get the number of sets and lines per tag. + * don't take the associativity directly in "ways_of_associativity" because + * some archs (ia64, ppc) put 0 there when fully-associative, while others (x86) put something like -1 there. + */ + sprintf(mappath, "%s/cpu%d/cache/index%d/number_of_sets", path, i, j); + fd = hwloc_fopen(mappath, "r", data->root_fd); + if (fd) { + if (fgets(str2,sizeof(str2), fd)) + sets = atol(str2); + fclose(fd); + } + sprintf(mappath, "%s/cpu%d/cache/index%d/physical_line_partition", path, i, j); + fd = hwloc_fopen(mappath, "r", data->root_fd); + if (fd) { + if (fgets(str2,sizeof(str2), fd)) + lines_per_tag = atol(str2); + fclose(fd); + } + + sprintf(mappath, "%s/cpu%d/cache/index%d/shared_cpu_map", path, i, j); + cacheset = hwloc_parse_cpumap(mappath, data->root_fd); + if (cacheset) { + if (hwloc_bitmap_weight(cacheset) < 1) { + /* mask is wrong (useful for many itaniums) */ + if (savedcoreset) + /* assume it's a core-specific cache */ + hwloc_bitmap_copy(cacheset, savedcoreset); + else + /* assumes it's not shared */ + hwloc_bitmap_only(cacheset, i); + } + + if (hwloc_bitmap_first(cacheset) == i) { + /* first cpu in this cache, add the cache */ + struct hwloc_obj *cache = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); + cache->attr->cache.size = kB << 10; + cache->attr->cache.depth = depth+1; + cache->attr->cache.linesize = linesize; + cache->attr->cache.type = type; + if (!linesize || !lines_per_tag || !sets) + cache->attr->cache.associativity = 0; /* unknown */ + else if (sets == 1) + cache->attr->cache.associativity = 0; /* likely wrong, make it unknown */ + else + cache->attr->cache.associativity = (kB << 10) / linesize / lines_per_tag / sets; + cache->cpuset = cacheset; + hwloc_debug_1arg_bitmap("cache depth %d has cpuset %s\n", + depth, cacheset); + hwloc_insert_object_by_cpuset(topology, cache); + cacheset = NULL; /* don't free it */ + ++caches_added; + } + } + hwloc_bitmap_free(cacheset); + } + hwloc_bitmap_free(coreset); + } + hwloc_bitmap_foreach_end(); + + if (0 == caches_added) + look_powerpc_device_tree(topology, data); + + hwloc_bitmap_free(cpuset); + + return 0; +} + + + +/**************************************** + ****** cpuinfo Topology Discovery ****** + ****************************************/ + +/* + * architecture properly detected: + * arm: "Processor\t:" => OK + * avr32: "chip type\t:" => OK + * blackfin: "model name\t:" => OK + * h8300: "CPU:" => OK + * ia64: "model name :" => OK + * m68k: "CPU:" => OK + * mips: "cpu model\t\t:" => OK + * openrisc: "CPU:" => OK + * ppc: "cpu\t\t:" => OK + * sparc: "cpu\t\t:" => OK + * tile: "model name\t:" => OK + * unicore32: "Processor\t:" => OK + * x86: "model name\t:" => OK + * + * cannot work: + * alpha: "cpu\t\t\t:" + "cpu model\t\t:" => no processor index lines anyway + * + * partially supported: + * cris: "cpu\t\t:" + "cpu model\t:" => only "cpu" + * frv: "CPU-Core:" + "CPU:" => only "CPU" + * mn10300: "cpu core :" + "model name :" => only "model name" + * parisc: "cpu family\t:" + "cpu\t\t:" => only "cpu" + * + * not supported because of conflicts with other arch minor lines: + * m32r: "cpu family\t:" => KO (adding "cpu family" would break "blackfin") + * microblaze: "CPU-Family:" => KO + * sh: "cpu family\t:" + "cpu type\t:" => KO + * xtensa: "model\t\t:" => KO + */ +static int +hwloc_linux_parse_cpuinfo_model(const char *prefix, const char *value, + char **model) +{ + if (!strcmp("model name", prefix) + || !strcmp("Processor", prefix) + || !strcmp("chip type", prefix) + || !strcmp("cpu model", prefix) + || !strcasecmp("cpu", prefix)) { + if (!*model) + *model = strdup(value); + } + return 0; +} + +static int +hwloc_linux_parse_cpuinfo(struct hwloc_linux_backend_data_s *data, + const char *path, + struct hwloc_linux_cpuinfo_proc ** Lprocs_p) +{ + FILE *fd; + char *str = NULL; + char *endptr; + unsigned len; + unsigned allocated_Lprocs = 0; + struct hwloc_linux_cpuinfo_proc * Lprocs = NULL; + unsigned numprocs = 0; + char *global_cpumodel = NULL; + + if (!(fd=hwloc_fopen(path,"r", data->root_fd))) + { + hwloc_debug("could not open %s\n", path); + return -1; + } + +# define PROCESSOR "processor" +# define PACKAGEID "physical id" /* the longest one */ +# define COREID "core id" + len = 128; /* vendor/model can be very long */ + str = malloc(len); + hwloc_debug("\n\n * Topology extraction from %s *\n\n", path); + while (fgets(str,len,fd)!=NULL) { + unsigned long Psock, Pcore, Pproc; + char *end, *dot, *prefix, *value; + int noend = 0; + + /* remove the ending \n */ + end = strchr(str, '\n'); + if (end) + *end = 0; + else + noend = 1; + /* skip lines with no dot */ + dot = strchr(str, ':'); + if (!dot) + continue; + /* skip lines not starting with a letter */ + if (*str > 'z' || *str < 'a') + continue; + + /* mark the end of the prefix */ + prefix = str; + end = dot; + while (end[-1] == ' ' || end[-1] == ' ') end--; /* need a strrspn() */ + *end = 0; + /* find beginning of value, its end is already marked */ + value = dot+1 + strspn(dot+1, " "); + + /* defines for parsing numbers */ +# define getprocnb_begin(field, var) \ + if (!strcmp(field,prefix)) { \ + var = strtoul(value,&endptr,0); \ + if (endptr==value) { \ + hwloc_debug("no number in "field" field of %s\n", path); \ + goto err; \ + } else if (var==ULONG_MAX) { \ + hwloc_debug("too big "field" number in %s\n", path); \ + goto err; \ + } \ + hwloc_debug(field " %lu\n", var) +# define getprocnb_end() \ + } + /* actually parse numbers */ + getprocnb_begin(PROCESSOR, Pproc); + numprocs++; + if (numprocs > allocated_Lprocs) { + if (!allocated_Lprocs) + allocated_Lprocs = 8; + else + allocated_Lprocs *= 2; + Lprocs = realloc(Lprocs, allocated_Lprocs * sizeof(*Lprocs)); + } + Lprocs[numprocs-1].Pproc = Pproc; + Lprocs[numprocs-1].Pcore = -1; + Lprocs[numprocs-1].Psock = -1; + Lprocs[numprocs-1].Lcore = -1; + Lprocs[numprocs-1].Lsock = -1; + Lprocs[numprocs-1].cpumodel = global_cpumodel ? strdup(global_cpumodel) : NULL; + getprocnb_end() else + getprocnb_begin(PACKAGEID, Psock); + Lprocs[numprocs-1].Psock = Psock; + getprocnb_end() else + getprocnb_begin(COREID, Pcore); + Lprocs[numprocs-1].Pcore = Pcore; + getprocnb_end() else { + /* we can't assume that we already got a processor index line: + * alpha/frv/h8300/m68k/microblaze/sparc have no processor lines at all, only a global entry. + * tile has a global section with model name before the list of processor lines. + */ + hwloc_linux_parse_cpuinfo_model(prefix, value, numprocs ? &Lprocs[numprocs-1].cpumodel : &global_cpumodel); + } + + if (noend) { + /* ignore end of line */ + if (fscanf(fd,"%*[^\n]") == EOF) + break; + getc(fd); + } + } + fclose(fd); + free(str); + free(global_cpumodel); + + *Lprocs_p = Lprocs; + return numprocs; + + err: + fclose(fd); + free(str); + free(global_cpumodel); + free(Lprocs); + return -1; +} + +static void +hwloc_linux_free_cpuinfo(struct hwloc_linux_cpuinfo_proc * Lprocs, unsigned numprocs) +{ + unsigned i; + for(i=0; icpuset = hwloc_bitmap_alloc(); + hwloc_bitmap_only(obj->cpuset, Pproc); + hwloc_debug_2args_bitmap("cpu %lu (os %lu) has cpuset %s\n", + Lproc, Pproc, obj->cpuset); + hwloc_insert_object_by_cpuset(topology, obj); + } + + topology->support.discovery->pu = 1; + hwloc_bitmap_copy(online_cpuset, cpuset); + hwloc_bitmap_free(cpuset); + + hwloc_debug("%u online processors found\n", numprocs); + hwloc_debug_bitmap("online processor cpuset: %s\n", online_cpuset); + + hwloc_debug("%s", "\n * Topology summary *\n"); + hwloc_debug("%u processors)\n", numprocs); + + /* fill Lprocs[].Lsock and Lsock_to_Psock */ + for(Lproc=0; Lproc0) { + for (i = 0; i < numsockets; i++) { + struct hwloc_obj *obj = hwloc_alloc_setup_object(HWLOC_OBJ_SOCKET, Lsock_to_Psock[i]); + char *cpumodel = NULL; + obj->cpuset = hwloc_bitmap_alloc(); + for(j=0; jcpuset, Lprocs[j].Pproc); + if (Lprocs[j].cpumodel && !cpumodel) /* use the first one, they should all be equal anyway */ + cpumodel = Lprocs[j].cpumodel; + } + if (cpumodel) { + /* FIXME add to name as well? */ + hwloc_obj_add_info(obj, "CPUModel", cpumodel); + } + hwloc_debug_1arg_bitmap("Socket %d has cpuset %s\n", i, obj->cpuset); + hwloc_insert_object_by_cpuset(topology, obj); + } + hwloc_debug("%s", "\n"); + } + + /* fill Lprocs[].Lcore, Lcore_to_Psock and Lcore_to_Pcore */ + for(Lproc=0; Lproc0) { + for (i = 0; i < numcores; i++) { + struct hwloc_obj *obj = hwloc_alloc_setup_object(HWLOC_OBJ_CORE, Lcore_to_Pcore[i]); + obj->cpuset = hwloc_bitmap_alloc(); + for(j=0; jcpuset, Lprocs[j].Pproc); + hwloc_debug_1arg_bitmap("Core %d has cpuset %s\n", i, obj->cpuset); + hwloc_insert_object_by_cpuset(topology, obj); + } + hwloc_debug("%s", "\n"); + } + + free(Lcore_to_Pcore); + free(Lcore_to_Psock); + free(Lsock_to_Psock); + + hwloc_linux_free_cpuinfo(Lprocs, numprocs); + + look_powerpc_device_tree(topology, data); + return 0; +} + + + +/************************************* + ****** Main Topology Discovery ****** + *************************************/ + +static void +hwloc__linux_get_mic_sn(struct hwloc_topology *topology, struct hwloc_linux_backend_data_s *data) +{ + FILE *file; + char line[64], *tmp, *end; + file = hwloc_fopen("/proc/elog", "r", data->root_fd); + if (!file) + return; + if (!fgets(line, sizeof(line), file)) + goto out_with_file; + if (strncmp(line, "Card ", 5)) + goto out_with_file; + tmp = line + 5; + end = strchr(tmp, ':'); + if (!end) + goto out_with_file; + *end = '\0'; + hwloc_obj_add_info(hwloc_get_root_obj(topology), "MICSerialNumber", tmp); + + out_with_file: + fclose(file); +} + +static void +hwloc_linux_fallback_pu_level(struct hwloc_topology *topology) +{ + if (topology->is_thissystem) + hwloc_setup_pu_level(topology, hwloc_fallback_nbprocessors(topology)); + else + /* fsys-root but not this system, no way, assume there's just 1 + * processor :/ */ + hwloc_setup_pu_level(topology, 1); +} + +static int +hwloc_look_linuxfs(struct hwloc_backend *backend) +{ + struct hwloc_topology *topology = backend->topology; + struct hwloc_linux_backend_data_s *data = backend->private_data; + DIR *nodes_dir; + unsigned nbnodes; + char *cpuset_mntpnt, *cgroup_mntpnt, *cpuset_name = NULL; + int err; + + if (topology->levels[0][0]->cpuset) + /* somebody discovered things */ + return 0; + + hwloc_alloc_obj_cpusets(topology->levels[0][0]); + + /* Gather the list of admin-disabled cpus and mems */ + hwloc_find_linux_cpuset_mntpnt(&cgroup_mntpnt, &cpuset_mntpnt, data->root_fd); + if (cgroup_mntpnt || cpuset_mntpnt) { + cpuset_name = hwloc_read_linux_cpuset_name(data->root_fd, topology->pid); + if (cpuset_name) { + hwloc_admin_disable_set_from_cpuset(data, cgroup_mntpnt, cpuset_mntpnt, cpuset_name, "cpus", topology->levels[0][0]->allowed_cpuset); + hwloc_admin_disable_set_from_cpuset(data, cgroup_mntpnt, cpuset_mntpnt, cpuset_name, "mems", topology->levels[0][0]->allowed_nodeset); + } + free(cgroup_mntpnt); + free(cpuset_mntpnt); + } + + nodes_dir = hwloc_opendir("/proc/nodes", data->root_fd); + if (nodes_dir) { + /* Kerrighed */ + struct dirent *dirent; + char path[128]; + hwloc_obj_t machine; + hwloc_bitmap_t machine_online_set; + + /* replace top-level object type with SYSTEM and add some MACHINE underneath */ + + topology->levels[0][0]->type = HWLOC_OBJ_SYSTEM; + topology->levels[0][0]->name = strdup("Kerrighed"); + + /* No cpuset support for now. */ + /* No sys support for now. */ + while ((dirent = readdir(nodes_dir)) != NULL) { + unsigned long node; + if (strncmp(dirent->d_name, "node", 4)) + continue; + machine_online_set = hwloc_bitmap_alloc(); + node = strtoul(dirent->d_name+4, NULL, 0); + snprintf(path, sizeof(path), "/proc/nodes/node%lu/cpuinfo", node); + err = look_cpuinfo(topology, data, path, machine_online_set); + if (err < 0) { + hwloc_bitmap_free(machine_online_set); + continue; + } + hwloc_bitmap_or(topology->levels[0][0]->online_cpuset, topology->levels[0][0]->online_cpuset, machine_online_set); + machine = hwloc_alloc_setup_object(HWLOC_OBJ_MACHINE, node); + machine->cpuset = machine_online_set; + hwloc_debug_1arg_bitmap("machine number %lu has cpuset %s\n", + node, machine_online_set); + + /* Get the machine memory attributes */ + hwloc_get_kerrighed_node_meminfo_info(topology, data, node, &machine->memory); + + /* Gather DMI info */ + /* FIXME: get the right DMI info of each machine */ + hwloc__get_dmi_info(data, machine); + + hwloc_insert_object_by_cpuset(topology, machine); + } + closedir(nodes_dir); + } else { + /* Get the machine memory attributes */ + hwloc_get_procfs_meminfo_info(topology, data, &topology->levels[0][0]->memory); + + /* Gather NUMA information. Must be after hwloc_get_procfs_meminfo_info so that the hugepage size is known */ + if (look_sysfsnode(topology, data, "/sys/bus/node/devices", &nbnodes) < 0) + look_sysfsnode(topology, data, "/sys/devices/system/node", &nbnodes); + + /* if we found some numa nodes, the machine object has no local memory */ + if (nbnodes) { + unsigned i; + topology->levels[0][0]->memory.local_memory = 0; + if (topology->levels[0][0]->memory.page_types) + for(i=0; ilevels[0][0]->memory.page_types_len; i++) + topology->levels[0][0]->memory.page_types[i].count = 0; + } + + /* Gather the list of cpus now */ + if (getenv("HWLOC_LINUX_USE_CPUINFO") + || (hwloc_access("/sys/devices/system/cpu/cpu0/topology/core_siblings", R_OK, data->root_fd) < 0 + && hwloc_access("/sys/devices/system/cpu/cpu0/topology/thread_siblings", R_OK, data->root_fd) < 0 + && hwloc_access("/sys/bus/cpu/devices/cpu0/topology/thread_siblings", R_OK, data->root_fd) < 0 + && hwloc_access("/sys/bus/cpu/devices/cpu0/topology/core_siblings", R_OK, data->root_fd) < 0)) { + /* revert to reading cpuinfo only if /sys/.../topology unavailable (before 2.6.16) + * or not containing anything interesting */ + err = look_cpuinfo(topology, data, "/proc/cpuinfo", topology->levels[0][0]->online_cpuset); + if (err < 0) + hwloc_linux_fallback_pu_level(topology); + + } else { + struct hwloc_linux_cpuinfo_proc * Lprocs = NULL; + int numprocs = hwloc_linux_parse_cpuinfo(data, "/proc/cpuinfo", &Lprocs); + if (numprocs <= 0) + Lprocs = NULL; + if (look_sysfscpu(topology, data, "/sys/bus/cpu/devices", Lprocs, numprocs) < 0) + if (look_sysfscpu(topology, data, "/sys/devices/system/cpu", Lprocs, numprocs) < 0) + /* sysfs but we failed to read cpu topology, fallback */ + hwloc_linux_fallback_pu_level(topology); + if (Lprocs) + hwloc_linux_free_cpuinfo(Lprocs, numprocs); + } + + /* Gather DMI info */ + hwloc__get_dmi_info(data, topology->levels[0][0]); + } + + hwloc_obj_add_info(topology->levels[0][0], "Backend", "Linux"); + if (cpuset_name) { + hwloc_obj_add_info(topology->levels[0][0], "LinuxCgroup", cpuset_name); + free(cpuset_name); + } + + hwloc__linux_get_mic_sn(topology, data); + + /* gather uname info if fsroot wasn't changed */ + if (topology->is_thissystem) + hwloc_add_uname_info(topology); + + return 1; +} + + + +/**************************************** + ***** Linux PCI backend callbacks ****** + **************************************** + * Do not support changing the fsroot (use sysfs) + */ + +static hwloc_obj_t +hwloc_linux_add_os_device(struct hwloc_backend *backend, struct hwloc_obj *pcidev, hwloc_obj_osdev_type_t type, const char *name) +{ + struct hwloc_topology *topology = backend->topology; + struct hwloc_obj *obj = hwloc_alloc_setup_object(HWLOC_OBJ_OS_DEVICE, -1); + obj->name = strdup(name); + obj->logical_index = -1; + obj->attr->osdev.type = type; + + hwloc_insert_object_by_parent(topology, pcidev, obj); + /* insert_object_by_parent() doesn't merge during insert, so obj is still valid */ + + return obj; +} + +typedef void (*hwloc_linux_class_fillinfos_t)(struct hwloc_backend *backend, struct hwloc_obj *osdev, const char *osdevpath); + +/* cannot be used in fsroot-aware code, would have to move to a per-topology variable */ + +static void +hwloc_linux_check_deprecated_classlinks_model(struct hwloc_linux_backend_data_s *data) +{ + int root_fd = data->root_fd; + DIR *dir; + struct dirent *dirent; + char path[128]; + struct stat st; + + data->deprecated_classlinks_model = -1; + + dir = hwloc_opendir("/sys/class/net", root_fd); + if (!dir) + return; + while ((dirent = readdir(dir)) != NULL) { + if (!strcmp(dirent->d_name, ".") || !strcmp(dirent->d_name, "..") || !strcmp(dirent->d_name, "lo")) + continue; + snprintf(path, sizeof(path), "/sys/class/net/%s/device/net/%s", dirent->d_name, dirent->d_name); + if (hwloc_stat(path, &st, root_fd) == 0) { + data->deprecated_classlinks_model = 0; + goto out; + } + snprintf(path, sizeof(path), "/sys/class/net/%s/device/net:%s", dirent->d_name, dirent->d_name); + if (hwloc_stat(path, &st, root_fd) == 0) { + data->deprecated_classlinks_model = 1; + goto out; + } + } +out: + closedir(dir); +} + +/* class objects that are immediately below pci devices: + * look for objects of the given classname below a sysfs (pcidev) directory + */ +static int +hwloc_linux_class_readdir(struct hwloc_backend *backend, + struct hwloc_obj *pcidev, const char *devicepath, + hwloc_obj_osdev_type_t type, const char *classname, + hwloc_linux_class_fillinfos_t fillinfo) +{ + struct hwloc_linux_backend_data_s *data = backend->private_data; + int root_fd = data->root_fd; + size_t classnamelen = strlen(classname); + char path[256]; + DIR *dir; + struct dirent *dirent; + hwloc_obj_t obj; + int res = 0, err; + + if (data->deprecated_classlinks_model == -2) + hwloc_linux_check_deprecated_classlinks_model(data); + + if (data->deprecated_classlinks_model != 1) { + /* modern sysfs: // */ + struct stat st; + snprintf(path, sizeof(path), "%s/%s", devicepath, classname); + + /* some very host kernel (2.6.9/RHEL4) have / symlink without any way to find . + * make sure / is a directory to avoid this case. + */ + err = hwloc_lstat(path, &st, root_fd); + if (err < 0 || !S_ISDIR(st.st_mode)) + goto trydeprecated; + + dir = hwloc_opendir(path, root_fd); + if (dir) { + data->deprecated_classlinks_model = 0; + while ((dirent = readdir(dir)) != NULL) { + if (!strcmp(dirent->d_name, ".") || !strcmp(dirent->d_name, "..")) + continue; + obj = hwloc_linux_add_os_device(backend, pcidev, type, dirent->d_name); + if (fillinfo) { + snprintf(path, sizeof(path), "%s/%s/%s", devicepath, classname, dirent->d_name); + fillinfo(backend, obj, path); + } + res++; + } + closedir(dir); + return res; + } + } + +trydeprecated: + if (data->deprecated_classlinks_model != 0) { + /* deprecated sysfs: /: */ + dir = hwloc_opendir(devicepath, root_fd); + if (dir) { + while ((dirent = readdir(dir)) != NULL) { + if (strncmp(dirent->d_name, classname, classnamelen) || dirent->d_name[classnamelen] != ':') + continue; + data->deprecated_classlinks_model = 1; + obj = hwloc_linux_add_os_device(backend, pcidev, type, dirent->d_name + classnamelen+1); + if (fillinfo) { + snprintf(path, sizeof(path), "%s/%s", devicepath, dirent->d_name); + fillinfo(backend, obj, path); + } + res++; + } + closedir(dir); + return res; + } + } + + return 0; +} + +/* + * look for net objects below a pcidev in sysfs + */ +static void +hwloc_linux_net_class_fillinfos(struct hwloc_backend *backend, + struct hwloc_obj *obj, const char *osdevpath) +{ + struct hwloc_linux_backend_data_s *data = backend->private_data; + int root_fd = data->root_fd; + FILE *fd; + struct stat st; + char path[256]; + snprintf(path, sizeof(path), "%s/address", osdevpath); + fd = hwloc_fopen(path, "r", root_fd); + if (fd) { + char address[128]; + if (fgets(address, sizeof(address), fd)) { + char *eol = strchr(address, '\n'); + if (eol) + *eol = 0; + hwloc_obj_add_info(obj, "Address", address); + } + fclose(fd); + } + snprintf(path, sizeof(path), "%s/device/infiniband", osdevpath); + if (!hwloc_stat(path, &st, root_fd)) { + snprintf(path, sizeof(path), "%s/dev_id", osdevpath); + fd = hwloc_fopen(path, "r", root_fd); + if (fd) { + char hexid[16]; + if (fgets(hexid, sizeof(hexid), fd)) { + char *eoid; + unsigned long port; + port = strtoul(hexid, &eoid, 0); + if (eoid != hexid) { + char portstr[16]; + snprintf(portstr, sizeof(portstr), "%ld", port+1); + hwloc_obj_add_info(obj, "Port", portstr); + } + } + fclose(fd); + } + } +} + +static int +hwloc_linux_lookup_net_class(struct hwloc_backend *backend, + struct hwloc_obj *pcidev, const char *pcidevpath) +{ + return hwloc_linux_class_readdir(backend, pcidev, pcidevpath, HWLOC_OBJ_OSDEV_NETWORK, "net", hwloc_linux_net_class_fillinfos); +} + +/* + * look for infiniband objects below a pcidev in sysfs + */ +static void +hwloc_linux_infiniband_class_fillinfos(struct hwloc_backend *backend, + struct hwloc_obj *obj, const char *osdevpath) +{ + struct hwloc_linux_backend_data_s *data = backend->private_data; + int root_fd = data->root_fd; + FILE *fd; + char path[256]; + unsigned i,j; + + snprintf(path, sizeof(path), "%s/node_guid", osdevpath); + fd = hwloc_fopen(path, "r", root_fd); + if (fd) { + char guidvalue[20]; + if (fgets(guidvalue, sizeof(guidvalue), fd)) { + size_t len; + len = strspn(guidvalue, "0123456789abcdefx:"); + assert(len == 19); + guidvalue[len] = '\0'; + hwloc_obj_add_info(obj, "NodeGUID", guidvalue); + } + fclose(fd); + } + + snprintf(path, sizeof(path), "%s/sys_image_guid", osdevpath); + fd = hwloc_fopen(path, "r", root_fd); + if (fd) { + char guidvalue[20]; + if (fgets(guidvalue, sizeof(guidvalue), fd)) { + size_t len; + len = strspn(guidvalue, "0123456789abcdefx:"); + assert(len == 19); + guidvalue[len] = '\0'; + hwloc_obj_add_info(obj, "SysImageGUID", guidvalue); + } + fclose(fd); + } + + for(i=1; ; i++) { + snprintf(path, sizeof(path), "%s/ports/%u/state", osdevpath, i); + fd = hwloc_fopen(path, "r", root_fd); + if (fd) { + char statevalue[2]; + if (fgets(statevalue, sizeof(statevalue), fd)) { + char statename[32]; + statevalue[1] = '\0'; /* only keep the first byte/digit */ + snprintf(statename, sizeof(statename), "Port%uState", i); + hwloc_obj_add_info(obj, statename, statevalue); + } + fclose(fd); + } else { + /* no such port */ + break; + } + + snprintf(path, sizeof(path), "%s/ports/%u/lid", osdevpath, i); + fd = hwloc_fopen(path, "r", root_fd); + if (fd) { + char lidvalue[11]; + if (fgets(lidvalue, sizeof(lidvalue), fd)) { + char lidname[32]; + size_t len; + len = strspn(lidvalue, "0123456789abcdefx"); + lidvalue[len] = '\0'; + snprintf(lidname, sizeof(lidname), "Port%uLID", i); + hwloc_obj_add_info(obj, lidname, lidvalue); + } + fclose(fd); + } + + snprintf(path, sizeof(path), "%s/ports/%u/lid_mask_count", osdevpath, i); + fd = hwloc_fopen(path, "r", root_fd); + if (fd) { + char lidvalue[11]; + if (fgets(lidvalue, sizeof(lidvalue), fd)) { + char lidname[32]; + size_t len; + len = strspn(lidvalue, "0123456789"); + lidvalue[len] = '\0'; + snprintf(lidname, sizeof(lidname), "Port%uLMC", i); + hwloc_obj_add_info(obj, lidname, lidvalue); + } + fclose(fd); + } + + for(j=0; ; j++) { + snprintf(path, sizeof(path), "%s/ports/%u/gids/%u", osdevpath, i, j); + fd = hwloc_fopen(path, "r", root_fd); + if (fd) { + char gidvalue[40]; + if (fgets(gidvalue, sizeof(gidvalue), fd)) { + char gidname[32]; + size_t len; + len = strspn(gidvalue, "0123456789abcdefx:"); + assert(len == 39); + gidvalue[len] = '\0'; + if (strncmp(gidvalue+20, "0000:0000:0000:0000", 19)) { + /* only keep initialized GIDs */ + snprintf(gidname, sizeof(gidname), "Port%uGID%u", i, j); + hwloc_obj_add_info(obj, gidname, gidvalue); + } + } + fclose(fd); + } else { + /* no such port */ + break; + } + } + } +} + +static int +hwloc_linux_lookup_openfabrics_class(struct hwloc_backend *backend, + struct hwloc_obj *pcidev, const char *pcidevpath) +{ + return hwloc_linux_class_readdir(backend, pcidev, pcidevpath, HWLOC_OBJ_OSDEV_OPENFABRICS, "infiniband", hwloc_linux_infiniband_class_fillinfos); +} + +/* look for dma objects below a pcidev in sysfs */ +static int +hwloc_linux_lookup_dma_class(struct hwloc_backend *backend, + struct hwloc_obj *pcidev, const char *pcidevpath) +{ + return hwloc_linux_class_readdir(backend, pcidev, pcidevpath, HWLOC_OBJ_OSDEV_DMA, "dma", NULL); +} + +/* look for drm objects below a pcidev in sysfs */ +static int +hwloc_linux_lookup_drm_class(struct hwloc_backend *backend, + struct hwloc_obj *pcidev, const char *pcidevpath) +{ + return hwloc_linux_class_readdir(backend, pcidev, pcidevpath, HWLOC_OBJ_OSDEV_GPU, "drm", NULL); + + /* we could look at the "graphics" class too, but it doesn't help for proprietary drivers either */ + + /* GPU devices (even with a proprietary driver) seem to have a boot_vga field in their PCI device directory (since 2.6.30), + * so we could create a OS device for each PCI devices with such a field. + * boot_vga is actually created when class >> 8 == VGA (it contains 1 for boot vga device), so it's trivial anyway. + */ +} + +/* + * look for block objects below a pcidev in sysfs + */ + +/* block class objects are in + * host%d/target%d:%d:%d/%d:%d:%d:%d/ + * or + * host%d/port-%d:%d/end_device-%d:%d/target%d:%d:%d/%d:%d:%d:%d/ + * or + * ide%d/%d.%d/ + * below pci devices */ +static int +hwloc_linux_lookup_host_block_class(struct hwloc_backend *backend, + struct hwloc_obj *pcidev, char *path, size_t pathlen) +{ + struct hwloc_linux_backend_data_s *data = backend->private_data; + int root_fd = data->root_fd; + DIR *hostdir, *portdir, *targetdir; + struct dirent *hostdirent, *portdirent, *targetdirent; + size_t hostdlen, portdlen, targetdlen; + int dummy; + int res = 0; + + hostdir = hwloc_opendir(path, root_fd); + if (!hostdir) + return 0; + + while ((hostdirent = readdir(hostdir)) != NULL) { + if (sscanf(hostdirent->d_name, "port-%d:%d", &dummy, &dummy) == 2) + { + /* found host%d/port-%d:%d */ + path[pathlen] = '/'; + strcpy(&path[pathlen+1], hostdirent->d_name); + pathlen += hostdlen = 1+strlen(hostdirent->d_name); + portdir = hwloc_opendir(path, root_fd); + if (!portdir) + continue; + while ((portdirent = readdir(portdir)) != NULL) { + if (sscanf(portdirent->d_name, "end_device-%d:%d", &dummy, &dummy) == 2) { + /* found host%d/port-%d:%d/end_device-%d:%d */ + path[pathlen] = '/'; + strcpy(&path[pathlen+1], portdirent->d_name); + pathlen += portdlen = 1+strlen(portdirent->d_name); + res += hwloc_linux_lookup_host_block_class(backend, pcidev, path, pathlen); + /* restore parent path */ + pathlen -= portdlen; + path[pathlen] = '\0'; + } + } + closedir(portdir); + /* restore parent path */ + pathlen -= hostdlen; + path[pathlen] = '\0'; + continue; + } else if (sscanf(hostdirent->d_name, "target%d:%d:%d", &dummy, &dummy, &dummy) == 3) { + /* found host%d/target%d:%d:%d */ + path[pathlen] = '/'; + strcpy(&path[pathlen+1], hostdirent->d_name); + pathlen += hostdlen = 1+strlen(hostdirent->d_name); + targetdir = hwloc_opendir(path, root_fd); + if (!targetdir) + continue; + while ((targetdirent = readdir(targetdir)) != NULL) { + if (sscanf(targetdirent->d_name, "%d:%d:%d:%d", &dummy, &dummy, &dummy, &dummy) != 4) + continue; + /* found host%d/target%d:%d:%d/%d:%d:%d:%d */ + path[pathlen] = '/'; + strcpy(&path[pathlen+1], targetdirent->d_name); + pathlen += targetdlen = 1+strlen(targetdirent->d_name); + /* lookup block class for real */ + res += hwloc_linux_class_readdir(backend, pcidev, path, HWLOC_OBJ_OSDEV_BLOCK, "block", NULL); + /* restore parent path */ + pathlen -= targetdlen; + path[pathlen] = '\0'; + } + closedir(targetdir); + /* restore parent path */ + pathlen -= hostdlen; + path[pathlen] = '\0'; + } + } + closedir(hostdir); + + return res; +} + +static int +hwloc_linux_lookup_block_class(struct hwloc_backend *backend, + struct hwloc_obj *pcidev, const char *pcidevpath) +{ + struct hwloc_linux_backend_data_s *data = backend->private_data; + int root_fd = data->root_fd; + size_t pathlen; + DIR *devicedir, *hostdir; + struct dirent *devicedirent, *hostdirent; + size_t devicedlen, hostdlen; + char path[256]; + int dummy; + int res = 0; + + strcpy(path, pcidevpath); + pathlen = strlen(path); + + devicedir = hwloc_opendir(pcidevpath, root_fd); + if (!devicedir) + return 0; + + while ((devicedirent = readdir(devicedir)) != NULL) { + if (sscanf(devicedirent->d_name, "ide%d", &dummy) == 1) { + /* found ide%d */ + path[pathlen] = '/'; + strcpy(&path[pathlen+1], devicedirent->d_name); + pathlen += devicedlen = 1+strlen(devicedirent->d_name); + hostdir = hwloc_opendir(path, root_fd); + if (!hostdir) + continue; + while ((hostdirent = readdir(hostdir)) != NULL) { + if (sscanf(hostdirent->d_name, "%d.%d", &dummy, &dummy) == 2) { + /* found ide%d/%d.%d */ + path[pathlen] = '/'; + strcpy(&path[pathlen+1], hostdirent->d_name); + pathlen += hostdlen = 1+strlen(hostdirent->d_name); + /* lookup block class for real */ + res += hwloc_linux_class_readdir(backend, pcidev, path, HWLOC_OBJ_OSDEV_BLOCK, "block", NULL); + /* restore parent path */ + pathlen -= hostdlen; + path[pathlen] = '\0'; + } + } + closedir(hostdir); + /* restore parent path */ + pathlen -= devicedlen; + path[pathlen] = '\0'; + } else if (sscanf(devicedirent->d_name, "host%d", &dummy) == 1) { + /* found host%d */ + path[pathlen] = '/'; + strcpy(&path[pathlen+1], devicedirent->d_name); + pathlen += devicedlen = 1+strlen(devicedirent->d_name); + res += hwloc_linux_lookup_host_block_class(backend, pcidev, path, pathlen); + /* restore parent path */ + pathlen -= devicedlen; + path[pathlen] = '\0'; + } else if (sscanf(devicedirent->d_name, "ata%d", &dummy) == 1) { + /* found ata%d */ + path[pathlen] = '/'; + strcpy(&path[pathlen+1], devicedirent->d_name); + pathlen += devicedlen = 1+strlen(devicedirent->d_name); + hostdir = hwloc_opendir(path, root_fd); + if (!hostdir) + continue; + while ((hostdirent = readdir(hostdir)) != NULL) { + if (sscanf(hostdirent->d_name, "host%d", &dummy) == 1) { + /* found ata%d/host%d */ + path[pathlen] = '/'; + strcpy(&path[pathlen+1], hostdirent->d_name); + pathlen += hostdlen = 1+strlen(hostdirent->d_name); + /* lookup block class for real */ + res += hwloc_linux_lookup_host_block_class(backend, pcidev, path, pathlen); + /* restore parent path */ + pathlen -= hostdlen; + path[pathlen] = '\0'; + } + } + closedir(hostdir); + /* restore parent path */ + pathlen -= devicedlen; + path[pathlen] = '\0'; + } + } + closedir(devicedir); + + return res; +} + +static void +hwloc_linux_mic_class_fillinfos(struct hwloc_backend *backend, + struct hwloc_obj *obj, const char *osdevpath) +{ + struct hwloc_linux_backend_data_s *data = backend->private_data; + int root_fd = data->root_fd; + FILE *fd; + char path[256]; + + hwloc_obj_add_info(obj, "CoProcType", "MIC"); + + snprintf(path, sizeof(path), "%s/family", osdevpath); + fd = hwloc_fopen(path, "r", root_fd); + if (fd) { + char family[64]; + if (fgets(family, sizeof(family), fd)) { + char *eol = strchr(family, '\n'); + if (eol) + *eol = 0; + hwloc_obj_add_info(obj, "MICFamily", family); + } + fclose(fd); + } + + snprintf(path, sizeof(path), "%s/sku", osdevpath); + fd = hwloc_fopen(path, "r", root_fd); + if (fd) { + char sku[64]; + if (fgets(sku, sizeof(sku), fd)) { + char *eol = strchr(sku, '\n'); + if (eol) + *eol = 0; + hwloc_obj_add_info(obj, "MICSKU", sku); + } + fclose(fd); + } + + snprintf(path, sizeof(path), "%s/serialnumber", osdevpath); + fd = hwloc_fopen(path, "r", root_fd); + if (fd) { + char sn[64]; + if (fgets(sn, sizeof(sn), fd)) { + char *eol = strchr(sn, '\n'); + if (eol) + *eol = 0; + hwloc_obj_add_info(obj, "MICSerialNumber", sn); + } + fclose(fd); + } + + snprintf(path, sizeof(path), "%s/active_cores", osdevpath); + fd = hwloc_fopen(path, "r", root_fd); + if (fd) { + char string[10]; + if (fgets(string, sizeof(string), fd)) { + unsigned long count = strtoul(string, NULL, 16); + snprintf(string, sizeof(string), "%lu", count); + hwloc_obj_add_info(obj, "MICActiveCores", string); + } + fclose(fd); + } + + snprintf(path, sizeof(path), "%s/memsize", osdevpath); + fd = hwloc_fopen(path, "r", root_fd); + if (fd) { + char string[20]; + if (fgets(string, sizeof(string), fd)) { + unsigned long count = strtoul(string, NULL, 16); + snprintf(string, sizeof(string), "%lu", count); + hwloc_obj_add_info(obj, "MICMemorySize", string); + } + fclose(fd); + } +} + +static int +hwloc_linux_lookup_mic_class(struct hwloc_backend *backend, + struct hwloc_obj *pcidev, const char *pcidevpath) +{ + return hwloc_linux_class_readdir(backend, pcidev, pcidevpath, HWLOC_OBJ_OSDEV_COPROC, "mic", hwloc_linux_mic_class_fillinfos); +} + +static int +hwloc_linux_directlookup_mic_class(struct hwloc_backend *backend, + struct hwloc_obj *pcidev) +{ + struct hwloc_linux_backend_data_s *data = backend->private_data; + int root_fd = data->root_fd; + char path[256]; + struct stat st; + hwloc_obj_t obj; + unsigned idx; + int res = 0; + + if (!data->mic_directlookup_id_max) + /* already tried, nothing to do */ + return 0; + + if (data->mic_directlookup_id_max == (unsigned) -1) { + /* never tried, find out the max id */ + DIR *dir; + struct dirent *dirent; + + /* make sure we never do this lookup again */ + data->mic_directlookup_id_max = 0; + + /* read the entire class and find the max id of mic%u dirents */ + dir = hwloc_opendir("/sys/devices/virtual/mic", root_fd); + if (!dir) { + dir = opendir("/sys/class/mic"); + if (!dir) + return 0; + } + while ((dirent = readdir(dir)) != NULL) { + if (!strcmp(dirent->d_name, ".") || !strcmp(dirent->d_name, "..")) + continue; + if (sscanf(dirent->d_name, "mic%u", &idx) != 1) + continue; + if (idx >= data->mic_directlookup_id_max) + data->mic_directlookup_id_max = idx+1; + } + closedir(dir); + } + + /* now iterate over the mic ids and see if one matches our pcidev */ + for(idx=0; idxmic_directlookup_id_max; idx++) { + snprintf(path, sizeof(path), "/sys/class/mic/mic%u/pci_%02x:%02x.%02x", + idx, pcidev->attr->pcidev.bus, pcidev->attr->pcidev.dev, pcidev->attr->pcidev.func); + if (hwloc_stat(path, &st, root_fd) < 0) + continue; + snprintf(path, sizeof(path), "mic%u", idx); + obj = hwloc_linux_add_os_device(backend, pcidev, HWLOC_OBJ_OSDEV_COPROC, path); + snprintf(path, sizeof(path), "/sys/class/mic/mic%u", idx); + hwloc_linux_mic_class_fillinfos(backend, obj, path); + res++; + } + + return res; +} + +/* + * backend callback for inserting objects inside a pci device + */ +static int +hwloc_linux_backend_notify_new_object(struct hwloc_backend *backend, struct hwloc_backend *caller __hwloc_attribute_unused, + struct hwloc_obj *obj) +{ + struct hwloc_linux_backend_data_s *data = backend->private_data; + char pcidevpath[256]; + int res = 0; + + /* this callback is only used in the libpci backend for now */ + assert(obj->type == HWLOC_OBJ_PCI_DEVICE); + + snprintf(pcidevpath, sizeof(pcidevpath), "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/", + obj->attr->pcidev.domain, obj->attr->pcidev.bus, + obj->attr->pcidev.dev, obj->attr->pcidev.func); + + res += hwloc_linux_lookup_net_class(backend, obj, pcidevpath); + res += hwloc_linux_lookup_openfabrics_class(backend, obj, pcidevpath); + res += hwloc_linux_lookup_dma_class(backend, obj, pcidevpath); + res += hwloc_linux_lookup_drm_class(backend, obj, pcidevpath); + res += hwloc_linux_lookup_block_class(backend, obj, pcidevpath); + + if (data->mic_need_directlookup == -1) { + struct stat st; + if (hwloc_stat("/sys/class/mic/mic0", &st, data->root_fd) == 0 + && hwloc_stat("/sys/class/mic/mic0/device/mic/mic0", &st, data->root_fd) == -1) + /* hwloc_linux_lookup_mic_class will fail because pcidev sysfs directories + * do not have mic/mic%u symlinks to mic devices (old mic driver). + * if so, try from the mic class. + */ + data->mic_need_directlookup = 1; + else + data->mic_need_directlookup = 0; + } + if (data->mic_need_directlookup) + res += hwloc_linux_directlookup_mic_class(backend, obj); + else + res += hwloc_linux_lookup_mic_class(backend, obj, pcidevpath); + + return res; +} + +/* + * backend callback for retrieving the location of a pci device + */ +static int +hwloc_linux_backend_get_obj_cpuset(struct hwloc_backend *backend, + struct hwloc_backend *caller __hwloc_attribute_unused, + struct hwloc_obj *obj, hwloc_bitmap_t cpuset) +{ + struct hwloc_linux_backend_data_s *data = backend->private_data; + char path[256]; + FILE *file; + int err; + + /* this callback is only used in the libpci backend for now */ + assert(obj->type == HWLOC_OBJ_PCI_DEVICE + || (obj->type == HWLOC_OBJ_BRIDGE && obj->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_PCI)); + + snprintf(path, sizeof(path), "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/local_cpus", + obj->attr->pcidev.domain, obj->attr->pcidev.bus, + obj->attr->pcidev.dev, obj->attr->pcidev.func); + file = hwloc_fopen(path, "r", data->root_fd); + if (file) { + err = hwloc_linux_parse_cpumap_file(file, cpuset); + fclose(file); + if (!err && !hwloc_bitmap_iszero(cpuset)) + return 0; + } + return -1; +} + + + +/******************************* + ******* Linux component ******* + *******************************/ + +static void +hwloc_linux_backend_disable(struct hwloc_backend *backend) +{ + struct hwloc_linux_backend_data_s *data = backend->private_data; +#ifdef HAVE_OPENAT + close(data->root_fd); +#endif + free(data); +} + +static struct hwloc_backend * +hwloc_linux_component_instantiate(struct hwloc_disc_component *component, + const void *_data1, + const void *_data2 __hwloc_attribute_unused, + const void *_data3 __hwloc_attribute_unused) +{ + struct hwloc_backend *backend; + struct hwloc_linux_backend_data_s *data; + const char * fsroot_path = _data1; + int root = -1; + + backend = hwloc_backend_alloc(component); + if (!backend) + goto out; + + data = malloc(sizeof(*data)); + if (!data) { + errno = ENOMEM; + goto out_with_backend; + } + + backend->private_data = data; + backend->discover = hwloc_look_linuxfs; + backend->get_obj_cpuset = hwloc_linux_backend_get_obj_cpuset; + backend->notify_new_object = hwloc_linux_backend_notify_new_object; + backend->disable = hwloc_linux_backend_disable; + + /* default values */ + data->is_real_fsroot = 1; + if (!fsroot_path) + fsroot_path = "/"; + +#ifdef HAVE_OPENAT + root = open(fsroot_path, O_RDONLY | O_DIRECTORY); + if (root < 0) + goto out_with_data; + + if (strcmp(fsroot_path, "/")) { + backend->is_thissystem = 0; + data->is_real_fsroot = 0; + } + +#else + if (strcmp(fsroot_path, "/")) { + errno = ENOSYS; + goto out_with_data; + } +#endif + data->root_fd = root; + + data->deprecated_classlinks_model = -2; /* never tried */ + data->mic_need_directlookup = -1; /* not initialized */ + data->mic_directlookup_id_max = -1; /* not initialized */ + + return backend; + + out_with_data: + free(data); + out_with_backend: + free(backend); + out: + return NULL; +} + +static struct hwloc_disc_component hwloc_linux_disc_component = { + HWLOC_DISC_COMPONENT_TYPE_CPU, + "linux", + HWLOC_DISC_COMPONENT_TYPE_GLOBAL, + hwloc_linux_component_instantiate, + 50, + NULL +}; + +const struct hwloc_component hwloc_linux_component = { + HWLOC_COMPONENT_ABI, + HWLOC_COMPONENT_TYPE_DISC, + 0, + &hwloc_linux_disc_component +}; + + + + +#ifdef HWLOC_HAVE_LINUXPCI + +/*********************************** + ******* Linux PCI component ******* + ***********************************/ + +#define HWLOC_PCI_REVISION_ID 0x08 +#define HWLOC_PCI_CAP_ID_EXP 0x10 +#define HWLOC_PCI_CLASS_NOT_DEFINED 0x0000 + +static int +hwloc_look_linuxfs_pci(struct hwloc_backend *backend) +{ + struct hwloc_topology *topology = backend->topology; + struct hwloc_backend *tmpbackend; + hwloc_obj_t first_obj = NULL, last_obj = NULL; + int root_fd = -1; + DIR *dir; + struct dirent *dirent; + int res = 0; + + if (!(hwloc_topology_get_flags(topology) & (HWLOC_TOPOLOGY_FLAG_IO_DEVICES|HWLOC_TOPOLOGY_FLAG_WHOLE_IO))) + return 0; + + if (hwloc_get_next_pcidev(topology, NULL)) { + hwloc_debug("%s", "PCI objects already added, ignoring linuxpci backend.\n"); + return 0; + } + + /* hackily find the linux backend to steal its fsroot */ + tmpbackend = topology->backends; + while (tmpbackend) { + if (tmpbackend->component == &hwloc_linux_disc_component) { + root_fd = ((struct hwloc_linux_backend_data_s *) tmpbackend->private_data)->root_fd; + hwloc_debug("linuxpci backend stole linux backend root_fd %d\n", root_fd); + break; } + tmpbackend = tmpbackend->next; + } + /* take our own descriptor, either pointing to linux fsroot, or to / if not found */ + if (root_fd >= 0) + root_fd = dup(root_fd); + else + root_fd = open("/", O_RDONLY | O_DIRECTORY); + + dir = hwloc_opendir("/sys/bus/pci/devices/", root_fd); + if (!dir) + goto out_with_rootfd; + + while ((dirent = readdir(dir)) != NULL) { + unsigned domain, bus, dev, func; + hwloc_obj_t obj; + struct hwloc_pcidev_attr_s *attr; + unsigned os_index; + char path[64]; + char value[16]; + FILE *file; + + if (sscanf(dirent->d_name, "%04x:%02x:%02x.%01x", &domain, &bus, &dev, &func) != 4) + continue; + + os_index = (domain << 20) + (bus << 12) + (dev << 4) + func; + obj = hwloc_alloc_setup_object(HWLOC_OBJ_PCI_DEVICE, os_index); + if (!obj) + break; + attr = &obj->attr->pcidev; + + attr->domain = domain; + attr->bus = bus; + attr->dev = dev; + attr->func = func; + + /* default (unknown) values */ + attr->vendor_id = 0; + attr->device_id = 0; + attr->class_id = HWLOC_PCI_CLASS_NOT_DEFINED; + attr->revision = 0; + attr->subvendor_id = 0; + attr->subdevice_id = 0; + attr->linkspeed = 0; + + snprintf(path, sizeof(path), "/sys/bus/pci/devices/%s/vendor", dirent->d_name); + file = hwloc_fopen(path, "r", root_fd); + if (file) { + fread(value, sizeof(value), 1, file); + fclose(file); + attr->vendor_id = strtoul(value, NULL, 16); + } + snprintf(path, sizeof(path), "/sys/bus/pci/devices/%s/device", dirent->d_name); + file = hwloc_fopen(path, "r", root_fd); + if (file) { + fread(value, sizeof(value), 1, file); + fclose(file); + attr->device_id = strtoul(value, NULL, 16); + } + snprintf(path, sizeof(path), "/sys/bus/pci/devices/%s/class", dirent->d_name); + file = hwloc_fopen(path, "r", root_fd); + if (file) { + fread(value, sizeof(value), 1, file); + fclose(file); + attr->class_id = strtoul(value, NULL, 16) >> 8; + } + snprintf(path, sizeof(path), "/sys/bus/pci/devices/%s/subsystem_vendor", dirent->d_name); + file = hwloc_fopen(path, "r", root_fd); + if (file) { + fread(value, sizeof(value), 1, file); + fclose(file); + attr->subvendor_id = strtoul(value, NULL, 16); + } + snprintf(path, sizeof(path), "/sys/bus/pci/devices/%s/subsystem_device", dirent->d_name); + file = hwloc_fopen(path, "r", root_fd); + if (file) { + fread(value, sizeof(value), 1, file); + fclose(file); + attr->subdevice_id = strtoul(value, NULL, 16); + } + + snprintf(path, sizeof(path), "/sys/bus/pci/devices/%s/config", dirent->d_name); + file = hwloc_fopen(path, "r", root_fd); + if (file) { +#define CONFIG_SPACE_CACHESIZE 256 + unsigned char config_space_cache[CONFIG_SPACE_CACHESIZE]; + unsigned offset; + + /* initialize the config space in case we fail to read it (missing permissions, etc). */ + memset(config_space_cache, 0xff, CONFIG_SPACE_CACHESIZE); + (void) fread(config_space_cache, 1, CONFIG_SPACE_CACHESIZE, file); + fclose(file); + + /* is this a bridge? */ + hwloc_pci_prepare_bridge(obj, config_space_cache); + + /* get the revision */ + attr->revision = config_space_cache[HWLOC_PCI_REVISION_ID]; + + /* try to get the link speed */ + offset = hwloc_pci_find_cap(config_space_cache, HWLOC_PCI_CAP_ID_EXP); + if (offset > 0 && offset + 20 /* size of PCI express block up to link status */ <= CONFIG_SPACE_CACHESIZE) + hwloc_pci_find_linkspeed(config_space_cache, offset, &attr->linkspeed); + } + + if (first_obj) + last_obj->next_sibling = obj; + else + first_obj = obj; + last_obj = obj; + } + + closedir(dir); + + res = hwloc_insert_pci_device_list(backend, first_obj); + + out_with_rootfd: + close(root_fd); + return res; +} + +static struct hwloc_backend * +hwloc_linuxpci_component_instantiate(struct hwloc_disc_component *component, + const void *_data1 __hwloc_attribute_unused, + const void *_data2 __hwloc_attribute_unused, + const void *_data3 __hwloc_attribute_unused) +{ + struct hwloc_backend *backend; + + /* thissystem may not be fully initialized yet, we'll check flags in discover() */ + + backend = hwloc_backend_alloc(component); + if (!backend) + return NULL; + backend->flags = HWLOC_BACKEND_FLAG_NEED_LEVELS; + backend->discover = hwloc_look_linuxfs_pci; + return backend; +} + +static struct hwloc_disc_component hwloc_linuxpci_disc_component = { + HWLOC_DISC_COMPONENT_TYPE_MISC, + "linuxpci", + HWLOC_DISC_COMPONENT_TYPE_GLOBAL, + hwloc_linuxpci_component_instantiate, + 19, /* after pci */ + NULL +}; + +const struct hwloc_component hwloc_linuxpci_component = { + HWLOC_COMPONENT_ABI, + HWLOC_COMPONENT_TYPE_DISC, + 0, + &hwloc_linuxpci_disc_component +}; + +#endif /* HWLOC_HAVE_LINUXPCI */ diff --git a/ext/hwloc/src/topology-noos.c b/ext/hwloc/src/topology-noos.c new file mode 100644 index 000000000..8c74ded20 --- /dev/null +++ b/ext/hwloc/src/topology-noos.c @@ -0,0 +1,57 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2012 Inria. All rights reserved. + * Copyright © 2009-2012 Université Bordeaux 1 + * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +#include +#include +#include + +static int +hwloc_look_noos(struct hwloc_backend *backend) +{ + struct hwloc_topology *topology = backend->topology; + + if (topology->levels[0][0]->cpuset) + /* somebody discovered things */ + return 0; + + hwloc_alloc_obj_cpusets(topology->levels[0][0]); + hwloc_setup_pu_level(topology, hwloc_fallback_nbprocessors(topology)); + if (topology->is_thissystem) + hwloc_add_uname_info(topology); + return 1; +} + +static struct hwloc_backend * +hwloc_noos_component_instantiate(struct hwloc_disc_component *component, + const void *_data1 __hwloc_attribute_unused, + const void *_data2 __hwloc_attribute_unused, + const void *_data3 __hwloc_attribute_unused) +{ + struct hwloc_backend *backend; + backend = hwloc_backend_alloc(component); + if (!backend) + return NULL; + backend->discover = hwloc_look_noos; + return backend; +} + +static struct hwloc_disc_component hwloc_noos_disc_component = { + HWLOC_DISC_COMPONENT_TYPE_CPU, + "no_os", + HWLOC_DISC_COMPONENT_TYPE_GLOBAL, + hwloc_noos_component_instantiate, + 40, /* lower than native OS component, higher than globals */ + NULL +}; + +const struct hwloc_component hwloc_noos_component = { + HWLOC_COMPONENT_ABI, + HWLOC_COMPONENT_TYPE_DISC, + 0, + &hwloc_noos_disc_component +}; diff --git a/ext/hwloc/src/topology-nvml.cb b/ext/hwloc/src/topology-nvml.cb new file mode 100644 index 000000000..de63266e0 --- /dev/null +++ b/ext/hwloc/src/topology-nvml.cb @@ -0,0 +1,232 @@ +/* + * Copyright © 2012 Inria. All rights reserved. + * See COPYING in top-level directory. + */ + +#include +#include +#include + +/* private headers allowed for convenience because this plugin is built within hwloc */ +#include +#include + +#include + +struct hwloc_nvml_backend_data_s { + unsigned nr_devices; /* -1 when unknown yet, first callback will setup */ + struct hwloc_nvml_device_info_s { + char name[64]; + char serial[64]; + char uuid[64]; + unsigned pcidomain, pcibus, pcidev, pcifunc; + float maxlinkspeed; + } * devices; +}; + +static void +hwloc_nvml_query_devices(struct hwloc_nvml_backend_data_s *data) +{ + nvmlReturn_t ret; + unsigned nb, i; + + /* mark the number of devices as 0 in case we fail below, + * so that we don't try again later. + */ + data->nr_devices = 0; + + ret = nvmlInit(); + if (NVML_SUCCESS != ret) + goto out; + ret = nvmlDeviceGetCount(&nb); + if (NVML_SUCCESS != ret) + goto out_with_init; + + /* allocate structs */ + data->devices = malloc(nb * sizeof(*data->devices)); + if (!data->devices) + goto out_with_init; + + for(i=0; idevices[data->nr_devices]; + nvmlPciInfo_t pci; + nvmlDevice_t device; + + ret = nvmlDeviceGetHandleByIndex(i, &device); + assert(ret == NVML_SUCCESS); + + ret = nvmlDeviceGetPciInfo(device, &pci); + if (NVML_SUCCESS != ret) + continue; + + info->pcidomain = pci.domain; + info->pcibus = pci.bus; + info->pcidev = pci.device; + info->pcifunc = 0; + + info->name[0] = '\0'; + ret = nvmlDeviceGetName(device, info->name, sizeof(info->name)); + /* these may fail with NVML_ERROR_NOT_SUPPORTED on old devices */ + info->serial[0] = '\0'; + ret = nvmlDeviceGetSerial(device, info->serial, sizeof(info->serial)); + info->uuid[0] = '\0'; + ret = nvmlDeviceGetUUID(device, info->uuid, sizeof(info->uuid)); + + info->maxlinkspeed = 0.0f; +#if HAVE_DECL_NVMLDEVICEGETMAXPCIELINKGENERATION + { + unsigned maxwidth = 0, maxgen = 0; + float lanespeed; + nvmlDeviceGetMaxPcieLinkWidth(device, &maxwidth); + nvmlDeviceGetMaxPcieLinkGeneration(device, &maxgen); + /* PCIe Gen1 = 2.5GT/s signal-rate per lane with 8/10 encoding = 0.25GB/s data-rate per lane + * PCIe Gen2 = 5 GT/s signal-rate per lane with 8/10 encoding = 0.5 GB/s data-rate per lane + * PCIe Gen3 = 8 GT/s signal-rate per lane with 128/130 encoding = 1 GB/s data-rate per lane + */ + lanespeed = maxgen <= 2 ? 2.5 * maxgen * 0.8 : 8.0 * 128/130; /* Gbit/s per lane */ + info->maxlinkspeed = lanespeed * maxwidth / 8; /* GB/s */ + } +#endif + + /* validate this device */ + data->nr_devices++; + } + +out_with_init: + nvmlShutdown(); +out: + return; +} + +static int +hwloc_nvml_backend_notify_new_object(struct hwloc_backend *backend, struct hwloc_backend *caller __hwloc_attribute_unused, + struct hwloc_obj *pcidev) +{ + struct hwloc_topology *topology = backend->topology; + struct hwloc_nvml_backend_data_s *data = backend->private_data; + unsigned i; + + if (!(hwloc_topology_get_flags(topology) & (HWLOC_TOPOLOGY_FLAG_IO_DEVICES|HWLOC_TOPOLOGY_FLAG_WHOLE_IO))) + return 0; + + if (!hwloc_topology_is_thissystem(topology)) { + hwloc_debug("%s", "\nno NVML detection (not thissystem)\n"); + return 0; + } + + if (HWLOC_OBJ_PCI_DEVICE != pcidev->type) + return 0; + + if (data->nr_devices == (unsigned) -1) { + /* first call, lookup all devices */ + hwloc_nvml_query_devices(data); + /* if it fails, data->nr_devices = 0 so we won't do anything below and in next callbacks */ + } + + if (!data->nr_devices) + /* found no devices */ + return 0; + + /* now the devices array is ready to use */ + for(i=0; inr_devices; i++) { + struct hwloc_nvml_device_info_s *info = &data->devices[i]; + hwloc_obj_t osdev; + char buffer[64]; + + if (info->pcidomain != pcidev->attr->pcidev.domain) + continue; + if (info->pcibus != pcidev->attr->pcidev.bus) + continue; + if (info->pcidev != pcidev->attr->pcidev.dev) + continue; + if (info->pcifunc != pcidev->attr->pcidev.func) + continue; + + osdev = hwloc_alloc_setup_object(HWLOC_OBJ_OS_DEVICE, -1); + snprintf(buffer, sizeof(buffer), "nvml%d", i); + osdev->name = strdup(buffer); + osdev->depth = (unsigned) HWLOC_TYPE_DEPTH_UNKNOWN; + osdev->attr->osdev.type = HWLOC_OBJ_OSDEV_GPU; + + hwloc_obj_add_info(osdev, "Backend", "NVML"); + hwloc_obj_add_info(osdev, "GPUVendor", "NVIDIA Corporation"); + hwloc_obj_add_info(osdev, "GPUModel", info->name); + if (info->serial[0] != '\0') + hwloc_obj_add_info(osdev, "NVIDIASerial", info->serial); + if (info->uuid[0] != '\0') + hwloc_obj_add_info(osdev, "NVIDIAUUID", info->uuid); + + hwloc_insert_object_by_parent(topology, pcidev, osdev); + + if (info->maxlinkspeed != 0.0f) + /* we found the max link speed, replace the current link speed found by pci (or none) */ + pcidev->attr->pcidev.linkspeed = info->maxlinkspeed; + + return 1; + } + + return 0; +} + +static void +hwloc_nvml_backend_disable(struct hwloc_backend *backend) +{ + struct hwloc_nvml_backend_data_s *data = backend->private_data; + free(data->devices); + free(data); +} + +static struct hwloc_backend * +hwloc_nvml_component_instantiate(struct hwloc_disc_component *component, + const void *_data1 __hwloc_attribute_unused, + const void *_data2 __hwloc_attribute_unused, + const void *_data3 __hwloc_attribute_unused) +{ + struct hwloc_backend *backend; + struct hwloc_nvml_backend_data_s *data; + + if (hwloc_plugin_check_namespace(component->name, "hwloc_backend_alloc") < 0) + return NULL; + + /* thissystem may not be fully initialized yet, we'll check flags in discover() */ + + backend = hwloc_backend_alloc(component); + if (!backend) + return NULL; + + data = malloc(sizeof(*data)); + if (!data) { + free(backend); + return NULL; + } + /* the first callback will initialize those */ + data->nr_devices = (unsigned) -1; /* unknown yet */ + data->devices = NULL; + + backend->private_data = data; + backend->disable = hwloc_nvml_backend_disable; + + backend->notify_new_object = hwloc_nvml_backend_notify_new_object; + return backend; +} + +static struct hwloc_disc_component hwloc_nvml_disc_component = { + HWLOC_DISC_COMPONENT_TYPE_MISC, + "nvml", + HWLOC_DISC_COMPONENT_TYPE_GLOBAL, + hwloc_nvml_component_instantiate, + 5, /* after pci, and after cuda since likely less useful */ + NULL +}; + +#ifdef HWLOC_INSIDE_PLUGIN +HWLOC_DECLSPEC extern const struct hwloc_component hwloc_nvml_component; +#endif + +const struct hwloc_component hwloc_nvml_component = { + HWLOC_COMPONENT_ABI, + HWLOC_COMPONENT_TYPE_DISC, + 0, + &hwloc_nvml_disc_component +}; + diff --git a/ext/hwloc/src/topology-opencl.cb b/ext/hwloc/src/topology-opencl.cb new file mode 100644 index 000000000..169679778 --- /dev/null +++ b/ext/hwloc/src/topology-opencl.cb @@ -0,0 +1,321 @@ +/* + * Copyright © 2012 Inria. All rights reserved. + * Copyright © 2013 Université Bordeaux 1. All right reserved. + * See COPYING in top-level directory. + */ + +#include +#include +#include + +/* private headers allowed for convenience because this plugin is built within hwloc */ +#include +#include + +#include + +typedef enum hwloc_opencl_device_type_e { + HWLOC_OPENCL_DEVICE_AMD +} hwloc_opencl_device_type_t; + +struct hwloc_opencl_backend_data_s { + unsigned nr_devices; /* -1 when unknown yet, first callback will setup */ + struct hwloc_opencl_device_info_s { + hwloc_opencl_device_type_t type; + + unsigned platformidx; + char platformname[64]; + unsigned platformdeviceidx; + char devicename[64]; + char devicevendor[64]; + char devicetype[64]; + + union hwloc_opencl_device_info_u { + struct hwloc_opencl_device_info_amd_s { + unsigned pcidomain, pcibus, pcidev, pcifunc; + } amd; + } specific; + } * devices; +}; + +static void +hwloc_opencl_query_devices(struct hwloc_opencl_backend_data_s *data) +{ + cl_platform_id *platform_ids = NULL; + cl_uint nr_platforms; + cl_device_id *device_ids = NULL; + cl_uint nr_devices, nr_total_devices, tmp; + cl_int clret; + unsigned curpfidx, curpfdvidx, i; + + /* mark the number of devices as 0 in case we fail below, + * so that we don't try again later. + */ + data->nr_devices = 0; + + /* count platforms, allocate and get them */ + clret = clGetPlatformIDs(0, NULL, &nr_platforms); + if (CL_SUCCESS != clret || !nr_platforms) + goto out; + hwloc_debug("%u OpenCL platforms\n", nr_platforms); + platform_ids = malloc(nr_platforms * sizeof(*platform_ids)); + if (!platform_ids) + goto out; + clret = clGetPlatformIDs(nr_platforms, platform_ids, &nr_platforms); + if (CL_SUCCESS != clret || !nr_platforms) + goto out_with_platform_ids; + + /* how many devices, total? */ + tmp = 0; + for(i=0; idevices = malloc(nr_total_devices * sizeof(*data->devices)); + if (!data->devices || !device_ids) + goto out_with_device_ids; + /* actually query device ids */ + tmp = 0; + for(i=0; idevices[data->nr_devices]; + cl_platform_id platform_id = 0; + cl_device_type type; +#ifdef CL_DEVICE_TOPOLOGY_AMD + cl_device_topology_amd amdtopo; +#endif + + hwloc_debug("Looking device %p\n", device_ids[i]); + + info->platformname[0] = '\0'; + clret = clGetDeviceInfo(device_ids[i], CL_DEVICE_PLATFORM, sizeof(platform_id), &platform_id, NULL); + if (CL_SUCCESS != clret) + continue; + clGetPlatformInfo(platform_id, CL_PLATFORM_NAME, sizeof(info->platformname), info->platformname, NULL); + + info->devicename[0] = '\0'; +#ifdef CL_DEVICE_BOARD_NAME_AMD + clGetDeviceInfo(device_ids[i], CL_DEVICE_BOARD_NAME_AMD, sizeof(info->devicename), info->devicename, NULL); +#else + clGetDeviceInfo(device_ids[i], CL_DEVICE_NAME, sizeof(info->devicename), info->devicename, NULL); +#endif + info->devicevendor[0] = '\0'; + clGetDeviceInfo(device_ids[i], CL_DEVICE_VENDOR, sizeof(info->devicevendor), info->devicevendor, NULL); + + clGetDeviceInfo(device_ids[i], CL_DEVICE_TYPE, sizeof(type), &type, NULL); + switch (type) { + case CL_DEVICE_TYPE_CPU: /* FIXME: cannot happen in PCI devices? */ + strcpy(info->devicetype, "CPU"); + break; + case CL_DEVICE_TYPE_GPU: + strcpy(info->devicetype, "GPU"); + break; + case CL_DEVICE_TYPE_ACCELERATOR: + strcpy(info->devicetype, "Accelerator"); + break; + default: + strcpy(info->devicetype, "Unknown"); + break; + } + + hwloc_debug("platform %s device %s vendor %s type %s\n", info->platformname, info->devicename, info->devicevendor, info->devicetype); + + /* find our indexes */ + while (platform_id != platform_ids[curpfidx]) { + curpfidx++; + curpfdvidx = 0; + } + info->platformidx = curpfidx; + info->platformdeviceidx = curpfdvidx; + curpfdvidx++; + + hwloc_debug("This is opencl%dd%d\n", info->platformidx, info->platformdeviceidx); + +#ifdef CL_DEVICE_TOPOLOGY_AMD + clret = clGetDeviceInfo(device_ids[i], CL_DEVICE_TOPOLOGY_AMD, sizeof(amdtopo), &amdtopo, NULL); + if (CL_SUCCESS != clret) { + hwloc_debug("no AMD-specific device information: %d\n", clret); + continue; + } + if (CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD != amdtopo.raw.type) { + hwloc_debug("not a PCIe device: %u\n", amdtopo.raw.type); + continue; + } + + info->type = HWLOC_OPENCL_DEVICE_AMD; + info->specific.amd.pcidomain = 0; + info->specific.amd.pcibus = amdtopo.pcie.bus; + info->specific.amd.pcidev = amdtopo.pcie.device; + info->specific.amd.pcifunc = amdtopo.pcie.function; + + hwloc_debug("OpenCL device on PCI 0000:%02x:%02x.%u\n", amdtopo.pcie.bus, amdtopo.pcie.device, amdtopo.pcie.function); + + /* validate this device */ + data->nr_devices++; +#endif /* HAVE_DECL_CL_DEVICE_TOPOLOGY_AMD */ + } + free(device_ids); + free(platform_ids); + return; + +out_with_device_ids: + free(device_ids); + free(data->devices); + data->devices = NULL; +out_with_platform_ids: + free(platform_ids); +out: + return; +} + +static int +hwloc_opencl_backend_notify_new_object(struct hwloc_backend *backend, struct hwloc_backend *caller __hwloc_attribute_unused, + struct hwloc_obj *pcidev) +{ + struct hwloc_topology *topology = backend->topology; + struct hwloc_opencl_backend_data_s *data = backend->private_data; + unsigned i; + + if (!(hwloc_topology_get_flags(topology) & (HWLOC_TOPOLOGY_FLAG_IO_DEVICES|HWLOC_TOPOLOGY_FLAG_WHOLE_IO))) + return 0; + + if (!hwloc_topology_is_thissystem(topology)) { + hwloc_debug("%s", "\nno OpenCL detection (not thissystem)\n"); + return 0; + } + + if (HWLOC_OBJ_PCI_DEVICE != pcidev->type) + return 0; + + if (data->nr_devices == (unsigned) -1) { + /* first call, lookup all devices */ + hwloc_opencl_query_devices(data); + /* if it fails, data->nr_devices = 0 so we won't do anything below and in next callbacks */ + } + + if (!data->nr_devices) + /* found no devices */ + return 0; + + /* now the devices array is ready to use */ + for(i=0; inr_devices; i++) { + struct hwloc_opencl_device_info_s *info = &data->devices[i]; + hwloc_obj_t osdev; + char buffer[64]; + + assert(info->type == HWLOC_OPENCL_DEVICE_AMD); + if (info->specific.amd.pcidomain != pcidev->attr->pcidev.domain) + continue; + if (info->specific.amd.pcibus != pcidev->attr->pcidev.bus) + continue; + if (info->specific.amd.pcidev != pcidev->attr->pcidev.dev) + continue; + if (info->specific.amd.pcifunc != pcidev->attr->pcidev.func) + continue; + + osdev = hwloc_alloc_setup_object(HWLOC_OBJ_OS_DEVICE, -1); + snprintf(buffer, sizeof(buffer), "opencl%dd%d", info->platformidx, info->platformdeviceidx); + osdev->name = strdup(buffer); + osdev->depth = (unsigned) HWLOC_TYPE_DEPTH_UNKNOWN; + osdev->attr->osdev.type = HWLOC_OBJ_OSDEV_COPROC; + + hwloc_obj_add_info(osdev, "CoProcType", "OpenCL"); + hwloc_obj_add_info(osdev, "Backend", "OpenCL"); + hwloc_obj_add_info(osdev, "OpenCLDeviceType", info->devicetype); + + if (info->devicevendor[0] != '\0') + hwloc_obj_add_info(osdev, "GPUVendor", info->devicevendor); + if (info->devicename[0] != '\0') + hwloc_obj_add_info(osdev, "GPUModel", info->devicename); + + snprintf(buffer, sizeof(buffer), "%u", info->platformidx); + hwloc_obj_add_info(osdev, "OpenCLPlatformIndex", buffer); + if (info->platformname[0] != '\0') + hwloc_obj_add_info(osdev, "OpenCLPlatformName", info->platformname); + + snprintf(buffer, sizeof(buffer), "%u", info->platformdeviceidx); + hwloc_obj_add_info(osdev, "OpenCLPlatformDeviceIndex", buffer); + + hwloc_insert_object_by_parent(topology, pcidev, osdev); + return 1; + } + + return 0; +} + +static void +hwloc_opencl_backend_disable(struct hwloc_backend *backend) +{ + struct hwloc_opencl_backend_data_s *data = backend->private_data; + free(data->devices); + free(data); +} + +static struct hwloc_backend * +hwloc_opencl_component_instantiate(struct hwloc_disc_component *component, + const void *_data1 __hwloc_attribute_unused, + const void *_data2 __hwloc_attribute_unused, + const void *_data3 __hwloc_attribute_unused) +{ + struct hwloc_backend *backend; + struct hwloc_opencl_backend_data_s *data; + + if (hwloc_plugin_check_namespace(component->name, "hwloc_backend_alloc") < 0) + return NULL; + + /* thissystem may not be fully initialized yet, we'll check flags in discover() */ + + backend = hwloc_backend_alloc(component); + if (!backend) + return NULL; + + data = malloc(sizeof(*data)); + if (!data) { + free(backend); + return NULL; + } + /* the first callback will initialize those */ + data->nr_devices = (unsigned) -1; /* unknown yet */ + data->devices = NULL; + + backend->private_data = data; + backend->disable = hwloc_opencl_backend_disable; + + backend->notify_new_object = hwloc_opencl_backend_notify_new_object; + return backend; +} + +static struct hwloc_disc_component hwloc_opencl_disc_component = { + HWLOC_DISC_COMPONENT_TYPE_MISC, + "opencl", + HWLOC_DISC_COMPONENT_TYPE_GLOBAL, + hwloc_opencl_component_instantiate, + 10, /* after pci */ + NULL +}; + +#ifdef HWLOC_INSIDE_PLUGIN +HWLOC_DECLSPEC extern const struct hwloc_component hwloc_opencl_component; +#endif + +const struct hwloc_component hwloc_opencl_component = { + HWLOC_COMPONENT_ABI, + HWLOC_COMPONENT_TYPE_DISC, + 0, + &hwloc_opencl_disc_component +}; diff --git a/ext/hwloc/src/topology-osf.cb b/ext/hwloc/src/topology-osf.cb new file mode 100644 index 000000000..52300fc2e --- /dev/null +++ b/ext/hwloc/src/topology-osf.cb @@ -0,0 +1,389 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2012 Inria. All rights reserved. + * Copyright © 2009-2011 Université Bordeaux 1 + * Copyright © 2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +#include + +#include +#ifdef HAVE_DIRENT_H +#include +#endif +#ifdef HAVE_UNISTD_H +#include +#endif +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include +#include +#include +#include + +/* + * TODO + * + * nsg_init(), nsg_attach_pid(), RAD_MIGRATE/RAD_WAIT + * assign_pid_to_pset() + * + * pthread_use_only_cpu too? + */ + +static int +prepare_radset(hwloc_topology_t topology __hwloc_attribute_unused, radset_t *radset, hwloc_const_bitmap_t hwloc_set) +{ + unsigned cpu; + cpuset_t target_cpuset; + cpuset_t cpuset, xor_cpuset; + radid_t radid; + int ret = 0; + int ret_errno = 0; + int nbnodes = rad_get_num(); + + cpusetcreate(&target_cpuset); + cpuemptyset(target_cpuset); + hwloc_bitmap_foreach_begin(cpu, hwloc_set) + cpuaddset(target_cpuset, cpu); + hwloc_bitmap_foreach_end(); + + cpusetcreate(&cpuset); + cpusetcreate(&xor_cpuset); + for (radid = 0; radid < nbnodes; radid++) { + cpuemptyset(cpuset); + if (rad_get_cpus(radid, cpuset)==-1) { + fprintf(stderr,"rad_get_cpus(%d) failed: %s\n",radid,strerror(errno)); + continue; + } + cpuxorset(target_cpuset, cpuset, xor_cpuset); + if (cpucountset(xor_cpuset) == 0) { + /* Found it */ + radsetcreate(radset); + rademptyset(*radset); + radaddset(*radset, radid); + ret = 1; + goto out; + } + } + /* radset containing exactly this set of CPUs not found */ + ret_errno = EXDEV; + +out: + cpusetdestroy(&target_cpuset); + cpusetdestroy(&cpuset); + cpusetdestroy(&xor_cpuset); + errno = ret_errno; + return ret; +} + +/* Note: get_cpubind not available on OSF */ + +static int +hwloc_osf_set_thread_cpubind(hwloc_topology_t topology, hwloc_thread_t thread, hwloc_const_bitmap_t hwloc_set, int flags) +{ + radset_t radset; + + if (hwloc_bitmap_isequal(hwloc_set, hwloc_topology_get_complete_cpuset(topology))) { + if ((errno = pthread_rad_detach(thread))) + return -1; + return 0; + } + + /* Apparently OSF migrates pages */ + if (flags & HWLOC_CPUBIND_NOMEMBIND) { + errno = ENOSYS; + return -1; + } + + if (!prepare_radset(topology, &radset, hwloc_set)) + return -1; + + if (flags & HWLOC_CPUBIND_STRICT) { + if ((errno = pthread_rad_bind(thread, radset, RAD_INSIST | RAD_WAIT))) + return -1; + } else { + if ((errno = pthread_rad_attach(thread, radset, RAD_WAIT))) + return -1; + } + radsetdestroy(&radset); + + return 0; +} + +static int +hwloc_osf_set_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_bitmap_t hwloc_set, int flags) +{ + radset_t radset; + + if (hwloc_bitmap_isequal(hwloc_set, hwloc_topology_get_complete_cpuset(topology))) { + if (rad_detach_pid(pid)) + return -1; + return 0; + } + + /* Apparently OSF migrates pages */ + if (flags & HWLOC_CPUBIND_NOMEMBIND) { + errno = ENOSYS; + return -1; + } + + if (!prepare_radset(topology, &radset, hwloc_set)) + return -1; + + if (flags & HWLOC_CPUBIND_STRICT) { + if (rad_bind_pid(pid, radset, RAD_INSIST | RAD_WAIT)) + return -1; + } else { + if (rad_attach_pid(pid, radset, RAD_WAIT)) + return -1; + } + radsetdestroy(&radset); + + return 0; +} + +static int +hwloc_osf_set_thisthread_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t hwloc_set, int flags) +{ + return hwloc_osf_set_thread_cpubind(topology, pthread_self(), hwloc_set, flags); +} + +static int +hwloc_osf_set_thisproc_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t hwloc_set, int flags) +{ + return hwloc_osf_set_proc_cpubind(topology, getpid(), hwloc_set, flags); +} + +static int +hwloc_osf_prepare_mattr(hwloc_topology_t topology __hwloc_attribute_unused, memalloc_attr_t *mattr, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags __hwloc_attribute_unused) +{ + unsigned long osf_policy; + int node; + + switch (policy) { + case HWLOC_MEMBIND_FIRSTTOUCH: + osf_policy = MPOL_THREAD; + break; + case HWLOC_MEMBIND_DEFAULT: + case HWLOC_MEMBIND_BIND: + osf_policy = MPOL_DIRECTED; + break; + case HWLOC_MEMBIND_INTERLEAVE: + osf_policy = MPOL_STRIPPED; + break; + case HWLOC_MEMBIND_REPLICATE: + osf_policy = MPOL_REPLICATED; + break; + default: + errno = ENOSYS; + return -1; + } + + memset(mattr, 0, sizeof(*mattr)); + mattr->mattr_policy = osf_policy; + mattr->mattr_rad = RAD_NONE; + radsetcreate(&mattr->mattr_radset); + rademptyset(mattr->mattr_radset); + + hwloc_bitmap_foreach_begin(node, nodeset) + radaddset(mattr->mattr_radset, node); + hwloc_bitmap_foreach_end(); + return 0; +} + +static int +hwloc_osf_set_area_membind(hwloc_topology_t topology, const void *addr, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) +{ + memalloc_attr_t mattr; + int behavior = 0; + int ret; + + if (flags & HWLOC_MEMBIND_MIGRATE) + behavior |= MADV_CURRENT; + if (flags & HWLOC_MEMBIND_STRICT) + behavior |= MADV_INSIST; + + if (hwloc_osf_prepare_mattr(topology, &mattr, nodeset, policy, flags)) + return -1; + + ret = nmadvise(addr, len, MADV_CURRENT, &mattr); + radsetdestroy(&mattr.mattr_radset); + return ret; +} + +static void * +hwloc_osf_alloc_membind(hwloc_topology_t topology, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) +{ + memalloc_attr_t mattr; + void *ptr; + + if (hwloc_osf_prepare_mattr(topology, &mattr, nodeset, policy, flags)) + return hwloc_alloc_or_fail(topology, len, flags); + + /* TODO: rather use acreate/amalloc ? */ + ptr = nmmap(NULL, len, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, + 0, &mattr); + radsetdestroy(&mattr.mattr_radset); + return ptr; +} + +static int +hwloc_look_osf(struct hwloc_backend *backend) +{ + struct hwloc_topology *topology = backend->topology; + cpu_cursor_t cursor; + unsigned nbnodes; + radid_t radid, radid2; + radset_t radset, radset2; + cpuid_t cpuid; + cpuset_t cpuset; + struct hwloc_obj *obj; + unsigned distance; + + if (topology->levels[0][0]->cpuset) + /* somebody discovered things */ + return 0; + + hwloc_alloc_obj_cpusets(topology->levels[0][0]); + + nbnodes = rad_get_num(); + + cpusetcreate(&cpuset); + radsetcreate(&radset); + radsetcreate(&radset2); + { + hwloc_obj_t *nodes = calloc(nbnodes, sizeof(hwloc_obj_t)); + unsigned *indexes = calloc(nbnodes, sizeof(unsigned)); + float *distances = calloc(nbnodes*nbnodes, sizeof(float)); + unsigned nfound; + numa_attr_t attr; + + attr.nattr_type = R_RAD; + attr.nattr_descr.rd_radset = radset; + attr.nattr_flags = 0; + + for (radid = 0; radid < (radid_t) nbnodes; radid++) { + rademptyset(radset); + radaddset(radset, radid); + cpuemptyset(cpuset); + if (rad_get_cpus(radid, cpuset)==-1) { + fprintf(stderr,"rad_get_cpus(%d) failed: %s\n",radid,strerror(errno)); + continue; + } + + indexes[radid] = radid; + nodes[radid] = obj = hwloc_alloc_setup_object(HWLOC_OBJ_NODE, radid); + obj->cpuset = hwloc_bitmap_alloc(); + obj->memory.local_memory = rad_get_physmem(radid) * hwloc_getpagesize(); + obj->memory.page_types_len = 2; + obj->memory.page_types = malloc(2*sizeof(*obj->memory.page_types)); + memset(obj->memory.page_types, 0, 2*sizeof(*obj->memory.page_types)); + obj->memory.page_types[0].size = hwloc_getpagesize(); +#ifdef HAVE__SC_LARGE_PAGESIZE + obj->memory.page_types[1].size = sysconf(_SC_LARGE_PAGESIZE); +#endif + + cursor = SET_CURSOR_INIT; + while((cpuid = cpu_foreach(cpuset, 0, &cursor)) != CPU_NONE) + hwloc_bitmap_set(obj->cpuset, cpuid); + + hwloc_debug_1arg_bitmap("node %d has cpuset %s\n", + radid, obj->cpuset); + + hwloc_insert_object_by_cpuset(topology, obj); + + nfound = 0; + for (radid2 = 0; radid2 < (radid_t) nbnodes; radid2++) + distances[radid*nbnodes+radid2] = RAD_DIST_REMOTE; + for (distance = RAD_DIST_LOCAL; distance < RAD_DIST_REMOTE; distance++) { + attr.nattr_distance = distance; + /* get set of NUMA nodes at distance <= DISTANCE */ + if (nloc(&attr, radset2)) { + fprintf(stderr,"nloc failed: %s\n", strerror(errno)); + continue; + } + cursor = SET_CURSOR_INIT; + while ((radid2 = rad_foreach(radset2, 0, &cursor)) != RAD_NONE) { + if (distances[radid*nbnodes+radid2] == RAD_DIST_REMOTE) { + distances[radid*nbnodes+radid2] = (float) distance; + nfound++; + } + } + if (nfound == nbnodes) + /* Finished finding distances, no need to go up to RAD_DIST_REMOTE */ + break; + } + } + + hwloc_distances_set(topology, HWLOC_OBJ_NODE, nbnodes, indexes, nodes, distances, 0 /* OS cannot force */); + } + radsetdestroy(&radset2); + radsetdestroy(&radset); + cpusetdestroy(&cpuset); + + /* add PU objects */ + hwloc_setup_pu_level(topology, hwloc_fallback_nbprocessors(topology)); + + hwloc_obj_add_info(topology->levels[0][0], "Backend", "OSF"); + if (topology->is_thissystem) + hwloc_add_uname_info(topology); + return 1; +} + +void +hwloc_set_osf_hooks(struct hwloc_binding_hooks *hooks, + struct hwloc_topology_support *support) +{ + hooks->set_thread_cpubind = hwloc_osf_set_thread_cpubind; + hooks->set_thisthread_cpubind = hwloc_osf_set_thisthread_cpubind; + hooks->set_proc_cpubind = hwloc_osf_set_proc_cpubind; + hooks->set_thisproc_cpubind = hwloc_osf_set_thisproc_cpubind; + hooks->set_area_membind = hwloc_osf_set_area_membind; + hooks->alloc_membind = hwloc_osf_alloc_membind; + hooks->alloc = hwloc_alloc_mmap; + hooks->free_membind = hwloc_free_mmap; + support->membind->firsttouch_membind = 1; + support->membind->bind_membind = 1; + support->membind->interleave_membind = 1; + support->membind->replicate_membind = 1; +} + +static struct hwloc_backend * +hwloc_osf_component_instantiate(struct hwloc_disc_component *component, + const void *_data1 __hwloc_attribute_unused, + const void *_data2 __hwloc_attribute_unused, + const void *_data3 __hwloc_attribute_unused) +{ + struct hwloc_backend *backend; + backend = hwloc_backend_alloc(component); + if (!backend) + return NULL; + backend->discover = hwloc_look_osf; + return backend; +} + +static struct hwloc_disc_component hwloc_osf_disc_component = { + HWLOC_DISC_COMPONENT_TYPE_CPU, + "osf", + HWLOC_DISC_COMPONENT_TYPE_GLOBAL, + hwloc_osf_component_instantiate, + 50, + NULL +}; + +const struct hwloc_component hwloc_osf_component = { + HWLOC_COMPONENT_ABI, + HWLOC_COMPONENT_TYPE_DISC, + 0, + &hwloc_osf_disc_component +}; diff --git a/ext/hwloc/src/topology-pci.c b/ext/hwloc/src/topology-pci.c new file mode 100644 index 000000000..a185d8635 --- /dev/null +++ b/ext/hwloc/src/topology-pci.c @@ -0,0 +1,411 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2013 Inria. All rights reserved. + * Copyright © 2009-2011, 2013 Université Bordeaux 1 + * See COPYING in top-level directory. + */ + +#include +#include +#include +#include + +/* private headers allowed for convenience because this plugin is built within hwloc */ +#include +#include + +#include +#include +#include +#include +#include +#include + +#if (defined HWLOC_HAVE_LIBPCIACCESS) && (defined HWLOC_HAVE_PCIUTILS) +#error Cannot have both LIBPCIACCESS and PCIUTILS enabled simultaneously +#elif (!defined HWLOC_HAVE_LIBPCIACCESS) && (!defined HWLOC_HAVE_PCIUTILS) +#error Cannot have neither LIBPCIACCESS nor PCIUTILS enabled simultaneously +#endif + +#ifdef HWLOC_HAVE_LIBPCIACCESS +#include +#else /* HWLOC_HAVE_PCIUTILS */ +#include +#endif + +#ifndef PCI_HEADER_TYPE +#define PCI_HEADER_TYPE 0x0e +#endif +#ifndef PCI_HEADER_TYPE_BRIDGE +#define PCI_HEADER_TYPE_BRIDGE 1 +#endif + +#ifndef PCI_CLASS_DEVICE +#define PCI_CLASS_DEVICE 0x0a +#endif +#ifndef PCI_CLASS_BRIDGE_PCI +#define PCI_CLASS_BRIDGE_PCI 0x0604 +#endif + +#ifndef PCI_REVISION_ID +#define PCI_REVISION_ID 0x08 +#endif + +#ifndef PCI_SUBSYSTEM_VENDOR_ID +#define PCI_SUBSYSTEM_VENDOR_ID 0x2c +#endif +#ifndef PCI_SUBSYSTEM_ID +#define PCI_SUBSYSTEM_ID 0x2e +#endif + +#ifndef PCI_PRIMARY_BUS +#define PCI_PRIMARY_BUS 0x18 +#endif +#ifndef PCI_SECONDARY_BUS +#define PCI_SECONDARY_BUS 0x19 +#endif +#ifndef PCI_SUBORDINATE_BUS +#define PCI_SUBORDINATE_BUS 0x1a +#endif + +#ifndef PCI_CAP_ID_EXP +#define PCI_CAP_ID_EXP 0x10 +#endif + +#ifndef PCI_CAP_NORMAL +#define PCI_CAP_NORMAL 1 +#endif + +#define CONFIG_SPACE_CACHESIZE 256 + + +#ifdef HWLOC_HAVE_PCIUTILS +/* Avoid letting libpci call exit(1) when no PCI bus is available. */ +static jmp_buf err_buf; +static void +hwloc_pci_error(char *msg, ...) +{ + va_list args; + + va_start(args, msg); + fprintf(stderr, "pcilib: "); + vfprintf(stderr, msg, args); + fprintf(stderr, "\n"); + longjmp(err_buf, 1); +} + +static void +hwloc_pci_warning(char *msg __hwloc_attribute_unused, ...) +{ +} +#endif + +static int +hwloc_look_pci(struct hwloc_backend *backend) +{ + struct hwloc_topology *topology = backend->topology; + struct hwloc_obj *first_obj = NULL, *last_obj = NULL; +#ifdef HWLOC_HAVE_LIBPCIACCESS + int ret; + struct pci_device_iterator *iter; + struct pci_device *pcidev; +#else /* HWLOC_HAVE_PCIUTILS */ + struct pci_access *pciaccess; + struct pci_dev *pcidev; +#endif + + if (!(hwloc_topology_get_flags(topology) & (HWLOC_TOPOLOGY_FLAG_IO_DEVICES|HWLOC_TOPOLOGY_FLAG_WHOLE_IO))) + return 0; + + if (hwloc_get_next_pcidev(topology, NULL)) { + hwloc_debug("%s", "PCI objects already added, ignoring pci backend.\n"); + return 0; + } + + if (!hwloc_topology_is_thissystem(topology)) { + hwloc_debug("%s", "\nno PCI detection (not thissystem)\n"); + return 0; + } + + hwloc_debug("%s", "\nScanning PCI buses...\n"); + + /* initialize PCI scanning */ +#ifdef HWLOC_HAVE_LIBPCIACCESS + ret = pci_system_init(); + if (ret) { + hwloc_debug("%s", "Can not initialize libpciaccess\n"); + return -1; + } + + iter = pci_slot_match_iterator_create(NULL); +#else /* HWLOC_HAVE_PCIUTILS */ + pciaccess = pci_alloc(); + pciaccess->error = hwloc_pci_error; + pciaccess->warning = hwloc_pci_warning; + + if (setjmp(err_buf)) { + pci_cleanup(pciaccess); + return -1; + } + + pci_init(pciaccess); + pci_scan_bus(pciaccess); +#endif + + /* iterate over devices */ +#ifdef HWLOC_HAVE_LIBPCIACCESS + for (pcidev = pci_device_next(iter); + pcidev; + pcidev = pci_device_next(iter)) +#else /* HWLOC_HAVE_PCIUTILS */ + for (pcidev = pciaccess->devices; + pcidev; + pcidev = pcidev->next) +#endif + { + const char *vendorname, *devicename, *fullname; + unsigned char config_space_cache[CONFIG_SPACE_CACHESIZE]; + struct hwloc_obj *obj; + unsigned os_index; + unsigned domain; + unsigned device_class; + unsigned short tmp16; + char name[128]; + unsigned offset; +#ifdef HWLOC_HAVE_PCI_FIND_CAP + struct pci_cap *cap; +#endif + + /* initialize the config space in case we fail to read it (missing permissions, etc). */ + memset(config_space_cache, 0xff, CONFIG_SPACE_CACHESIZE); +#ifdef HWLOC_HAVE_LIBPCIACCESS + pci_device_probe(pcidev); + pci_device_cfg_read(pcidev, config_space_cache, 0, CONFIG_SPACE_CACHESIZE, NULL); +#else /* HWLOC_HAVE_PCIUTILS */ + pci_read_block(pcidev, 0, config_space_cache, CONFIG_SPACE_CACHESIZE); /* doesn't even tell how much it actually reads */ +#endif + + /* try to read the domain */ +#if (defined HWLOC_HAVE_LIBPCIACCESS) || (defined HWLOC_HAVE_PCIDEV_DOMAIN) + domain = pcidev->domain; +#else + domain = 0; /* default domain number */ +#endif + + /* try to read the device_class */ +#ifdef HWLOC_HAVE_LIBPCIACCESS + device_class = pcidev->device_class >> 8; +#else /* HWLOC_HAVE_PCIUTILS */ +#ifdef HWLOC_HAVE_PCIDEV_DEVICE_CLASS + device_class = pcidev->device_class; +#else + device_class = config_space_cache[PCI_CLASS_DEVICE] | (config_space_cache[PCI_CLASS_DEVICE+1] << 8); +#endif +#endif + + /* might be useful for debugging (note that domain might be truncated) */ + os_index = (domain << 20) + (pcidev->bus << 12) + (pcidev->dev << 4) + pcidev->func; + + obj = hwloc_alloc_setup_object(HWLOC_OBJ_PCI_DEVICE, os_index); + obj->attr->pcidev.domain = domain; + obj->attr->pcidev.bus = pcidev->bus; + obj->attr->pcidev.dev = pcidev->dev; + obj->attr->pcidev.func = pcidev->func; + obj->attr->pcidev.vendor_id = pcidev->vendor_id; + obj->attr->pcidev.device_id = pcidev->device_id; + obj->attr->pcidev.class_id = device_class; + obj->attr->pcidev.revision = config_space_cache[PCI_REVISION_ID]; + + obj->attr->pcidev.linkspeed = 0; /* unknown */ +#ifdef HWLOC_HAVE_PCI_FIND_CAP + cap = pci_find_cap(pcidev, PCI_CAP_ID_EXP, PCI_CAP_NORMAL); + offset = cap ? cap->addr : 0; +#else + offset = hwloc_pci_find_cap(config_space_cache, PCI_CAP_ID_EXP); +#endif /* HWLOC_HAVE_PCI_FIND_CAP */ + + if (0xffff == pcidev->vendor_id && 0xffff == pcidev->device_id) { + /* SR-IOV puts ffff:ffff in Virtual Function config space. + * The actual VF device ID is stored at a special (dynamic) location in the Physical Function config space. + * VF and PF have the same vendor ID. + * + * libpciaccess just returns ffff:ffff, needs to be fixed. + * linuxpci is OK because sysfs files are already fixed the kernel. + * pciutils is OK when it uses those Linux sysfs files. + * + * Reading these files is an easy way to work around the libpciaccess issue on Linux, + * but we have no way to know if this is caused by SR-IOV or not. + * + * TODO: + * If PF has CAP_ID_PCIX or CAP_ID_EXP (offset>0), + * look for extended capability PCI_EXT_CAP_ID_SRIOV (need extended config space (more than 256 bytes)), + * then read the VF device ID after it (PCI_IOV_DID bytes later). + * Needs access to extended config space (needs root on Linux). + * TODO: + * Add string info attributes in VF and PF objects? + */ +#ifdef HWLOC_LINUX_SYS + /* Workaround for Linux (the kernel returns the VF device/vendor IDs). */ + char path[64]; + char value[16]; + FILE *file; + snprintf(path, sizeof(path), "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/vendor", + domain, pcidev->bus, pcidev->dev, pcidev->func); + file = fopen(path, "r"); + if (file) { + fread(value, sizeof(value), 1, file); + fclose(file); + obj->attr->pcidev.vendor_id = strtoul(value, NULL, 16); + } + snprintf(path, sizeof(path), "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/device", + domain, pcidev->bus, pcidev->dev, pcidev->func); + file = fopen(path, "r"); + if (file) { + fread(value, sizeof(value), 1, file); + fclose(file); + obj->attr->pcidev.device_id = strtoul(value, NULL, 16); + } +#endif + } + + if (offset > 0 && offset + 20 /* size of PCI express block up to link status */ <= CONFIG_SPACE_CACHESIZE) + hwloc_pci_find_linkspeed(config_space_cache, offset, &obj->attr->pcidev.linkspeed); + + hwloc_pci_prepare_bridge(obj, config_space_cache); + + if (obj->type == HWLOC_OBJ_PCI_DEVICE) { + memcpy(&tmp16, &config_space_cache[PCI_SUBSYSTEM_VENDOR_ID], sizeof(tmp16)); + obj->attr->pcidev.subvendor_id = tmp16; + memcpy(&tmp16, &config_space_cache[PCI_SUBSYSTEM_ID], sizeof(tmp16)); + obj->attr->pcidev.subdevice_id = tmp16; + } else { + /* TODO: + * bridge must lookup PCI_CAP_ID_SSVID and then look at offset+PCI_SSVID_VENDOR/DEVICE_ID + * cardbus must look at PCI_CB_SUBSYSTEM_VENDOR_ID and PCI_CB_SUBSYSTEM_ID + */ + } + + /* starting from pciutils 2.2, pci_lookup_name() takes a variable number + * of arguments, and supports the PCI_LOOKUP_NO_NUMBERS flag. + */ + + /* get the vendor name */ +#ifdef HWLOC_HAVE_LIBPCIACCESS + vendorname = pci_device_get_vendor_name(pcidev); +#else /* HWLOC_HAVE_PCIUTILS */ + vendorname = pci_lookup_name(pciaccess, name, sizeof(name), +#if HAVE_DECL_PCI_LOOKUP_NO_NUMBERS + PCI_LOOKUP_VENDOR|PCI_LOOKUP_NO_NUMBERS, + pcidev->vendor_id +#else + PCI_LOOKUP_VENDOR, + pcidev->vendor_id, 0, 0, 0 +#endif + ); +#endif /* HWLOC_HAVE_PCIUTILS */ + if (vendorname && *vendorname) + hwloc_obj_add_info(obj, "PCIVendor", vendorname); + + /* get the device name */ +#ifdef HWLOC_HAVE_LIBPCIACCESS + devicename = pci_device_get_device_name(pcidev); +#else /* HWLOC_HAVE_PCIUTILS */ + devicename = pci_lookup_name(pciaccess, name, sizeof(name), +#if HAVE_DECL_PCI_LOOKUP_NO_NUMBERS + PCI_LOOKUP_DEVICE|PCI_LOOKUP_NO_NUMBERS, + pcidev->vendor_id, pcidev->device_id +#else + PCI_LOOKUP_DEVICE, + pcidev->vendor_id, pcidev->device_id, 0, 0 +#endif + ); +#endif /* HWLOC_HAVE_PCIUTILS */ + if (devicename && *devicename) + hwloc_obj_add_info(obj, "PCIDevice", devicename); + + /* generate or get the fullname */ +#ifdef HWLOC_HAVE_LIBPCIACCESS + snprintf(name, sizeof(name), "%s%s%s", + vendorname ? vendorname : "", + vendorname && devicename ? " " : "", + devicename ? devicename : ""); + fullname = name; + if (*name) + obj->name = strdup(name); +#else /* HWLOC_HAVE_PCIUTILS */ + fullname = pci_lookup_name(pciaccess, name, sizeof(name), +#if HAVE_DECL_PCI_LOOKUP_NO_NUMBERS + PCI_LOOKUP_VENDOR|PCI_LOOKUP_DEVICE|PCI_LOOKUP_NO_NUMBERS, + pcidev->vendor_id, pcidev->device_id +#else + PCI_LOOKUP_VENDOR|PCI_LOOKUP_DEVICE, + pcidev->vendor_id, pcidev->device_id, 0, 0 +#endif + ); + if (fullname && *fullname) + obj->name = strdup(fullname); +#endif /* HWLOC_HAVE_PCIUTILS */ + hwloc_debug(" %04x:%02x:%02x.%01x %04x %04x:%04x %s\n", + domain, pcidev->bus, pcidev->dev, pcidev->func, + device_class, pcidev->vendor_id, pcidev->device_id, + fullname && *fullname ? fullname : "??"); + + /* queue the object for now */ + if (first_obj) + last_obj->next_sibling = obj; + else + first_obj = obj; + last_obj = obj; + } + + /* finalize device scanning */ +#ifdef HWLOC_HAVE_LIBPCIACCESS + pci_iterator_destroy(iter); + pci_system_cleanup(); +#else /* HWLOC_HAVE_PCIUTILS */ + pci_cleanup(pciaccess); +#endif + + return hwloc_insert_pci_device_list(backend, first_obj); +} + +static struct hwloc_backend * +hwloc_pci_component_instantiate(struct hwloc_disc_component *component, + const void *_data1 __hwloc_attribute_unused, + const void *_data2 __hwloc_attribute_unused, + const void *_data3 __hwloc_attribute_unused) +{ + struct hwloc_backend *backend; + + if (hwloc_plugin_check_namespace(component->name, "hwloc_backend_alloc") < 0) + return NULL; + + /* thissystem may not be fully initialized yet, we'll check flags in discover() */ + + backend = hwloc_backend_alloc(component); + if (!backend) + return NULL; + backend->flags = HWLOC_BACKEND_FLAG_NEED_LEVELS; + backend->discover = hwloc_look_pci; + return backend; +} + +static struct hwloc_disc_component hwloc_pci_disc_component = { + HWLOC_DISC_COMPONENT_TYPE_MISC, + "pci", + HWLOC_DISC_COMPONENT_TYPE_GLOBAL, + hwloc_pci_component_instantiate, + 20, + NULL +}; + +#ifdef HWLOC_INSIDE_PLUGIN +HWLOC_DECLSPEC extern const struct hwloc_component hwloc_pci_component; +#endif + +const struct hwloc_component hwloc_pci_component = { + HWLOC_COMPONENT_ABI, + HWLOC_COMPONENT_TYPE_DISC, + 0, + &hwloc_pci_disc_component +}; diff --git a/ext/hwloc/src/topology-synthetic.c b/ext/hwloc/src/topology-synthetic.c new file mode 100644 index 000000000..11c8c333e --- /dev/null +++ b/ext/hwloc/src/topology-synthetic.c @@ -0,0 +1,444 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2012 Inria. All rights reserved. + * Copyright © 2009-2010 Université Bordeaux 1 + * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +#include +#include +#include +#include +#include + +#include +#include +#ifdef HAVE_STRINGS_H +#include +#endif + +struct hwloc_synthetic_backend_data_s { + /* synthetic backend parameters */ + char *string; +#define HWLOC_SYNTHETIC_MAX_DEPTH 128 + unsigned arity[HWLOC_SYNTHETIC_MAX_DEPTH]; + hwloc_obj_type_t type[HWLOC_SYNTHETIC_MAX_DEPTH]; + unsigned id[HWLOC_SYNTHETIC_MAX_DEPTH]; + unsigned depth[HWLOC_SYNTHETIC_MAX_DEPTH]; /* For cache/misc */ +}; + +/* Read from description a series of integers describing a symmetrical + topology and update the hwloc_synthetic_backend_data_s accordingly. On + success, return zero. */ +static int +hwloc_backend_synthetic_init(struct hwloc_synthetic_backend_data_s *data, + const char *description) +{ + const char *pos, *next_pos; + unsigned long item, count; + unsigned i; + int cache_depth = 0, group_depth = 0; + int nb_machine_levels = 0, nb_node_levels = 0; + int nb_pu_levels = 0; + int verbose = 0; + char *env = getenv("HWLOC_SYNTHETIC_VERBOSE"); + + if (env) + verbose = atoi(env); + + for (pos = description, count = 1; *pos; pos = next_pos) { +#define HWLOC_OBJ_TYPE_UNKNOWN ((hwloc_obj_type_t) -1) + hwloc_obj_type_t type = HWLOC_OBJ_TYPE_UNKNOWN; + + while (*pos == ' ') + pos++; + + if (!*pos) + break; + + if (*pos < '0' || *pos > '9') { + if (!hwloc_namecoloncmp(pos, "machines", 2)) { + type = HWLOC_OBJ_MACHINE; + } else if (!hwloc_namecoloncmp(pos, "nodes", 1)) + type = HWLOC_OBJ_NODE; + else if (!hwloc_namecoloncmp(pos, "sockets", 1)) + type = HWLOC_OBJ_SOCKET; + else if (!hwloc_namecoloncmp(pos, "cores", 2)) + type = HWLOC_OBJ_CORE; + else if (!hwloc_namecoloncmp(pos, "caches", 2)) + type = HWLOC_OBJ_CACHE; + else if (!hwloc_namecoloncmp(pos, "pus", 1)) + type = HWLOC_OBJ_PU; + else if (!hwloc_namecoloncmp(pos, "misc", 2)) + type = HWLOC_OBJ_MISC; + else if (!hwloc_namecoloncmp(pos, "group", 2)) + type = HWLOC_OBJ_GROUP; + else if (verbose) + fprintf(stderr, "Synthetic string with unknown object type `%s'\n", pos); + + next_pos = strchr(pos, ':'); + if (!next_pos) { + if (verbose) + fprintf(stderr,"Synthetic string doesn't have a `:' after object type at '%s'\n", pos); + errno = EINVAL; + return -1; + } + pos = next_pos + 1; + } + item = strtoul(pos, (char **)&next_pos, 0); + if (next_pos == pos) { + if (verbose) + fprintf(stderr,"Synthetic string doesn't have a number of objects at '%s'\n", pos); + errno = EINVAL; + return -1; + } + + if (count + 1 >= HWLOC_SYNTHETIC_MAX_DEPTH) { + if (verbose) + fprintf(stderr,"Too many synthetic levels, max %d\n", HWLOC_SYNTHETIC_MAX_DEPTH); + errno = EINVAL; + return -1; + } + if (item > UINT_MAX) { + if (verbose) + fprintf(stderr,"Too big arity, max %u\n", UINT_MAX); + errno = EINVAL; + return -1; + } + + data->arity[count-1] = (unsigned)item; + data->type[count] = type; + count++; + } + + if (count <= 0) { + if (verbose) + fprintf(stderr, "Synthetic string doesn't contain any object\n"); + errno = EINVAL; + return -1; + } + + for(i=count-1; i>0; i--) { + hwloc_obj_type_t type; + + type = data->type[i]; + + if (type == HWLOC_OBJ_TYPE_UNKNOWN) { + if (i == count-1) + type = HWLOC_OBJ_PU; + else { + switch (data->type[i+1]) { + case HWLOC_OBJ_PU: type = HWLOC_OBJ_CORE; break; + case HWLOC_OBJ_CORE: type = HWLOC_OBJ_CACHE; break; + case HWLOC_OBJ_CACHE: type = HWLOC_OBJ_SOCKET; break; + case HWLOC_OBJ_SOCKET: type = HWLOC_OBJ_NODE; break; + case HWLOC_OBJ_NODE: + case HWLOC_OBJ_GROUP: type = HWLOC_OBJ_GROUP; break; + case HWLOC_OBJ_MACHINE: + case HWLOC_OBJ_MISC: type = HWLOC_OBJ_MISC; break; + default: + assert(0); + } + } + data->type[i] = type; + } + switch (type) { + case HWLOC_OBJ_PU: + if (nb_pu_levels) { + if (verbose) + fprintf(stderr, "Synthetic string can not have several PU levels\n"); + errno = EINVAL; + return -1; + } + nb_pu_levels++; + break; + case HWLOC_OBJ_CACHE: + cache_depth++; + break; + case HWLOC_OBJ_GROUP: + group_depth++; + break; + case HWLOC_OBJ_NODE: + nb_node_levels++; + break; + case HWLOC_OBJ_MACHINE: + nb_machine_levels++; + break; + default: + break; + } + } + + if (!nb_pu_levels) { + if (verbose) + fprintf(stderr, "Synthetic string missing ending number of PUs\n"); + errno = EINVAL; + return -1; + } + + if (nb_pu_levels > 1) { + if (verbose) + fprintf(stderr, "Synthetic string can not have several PU levels\n"); + errno = EINVAL; + return -1; + } + if (nb_node_levels > 1) { + if (verbose) + fprintf(stderr, "Synthetic string can not have several NUMA node levels\n"); + errno = EINVAL; + return -1; + } + if (nb_machine_levels > 1) { + if (verbose) + fprintf(stderr, "Synthetic string can not have several machine levels\n"); + errno = EINVAL; + return -1; + } + + if (nb_machine_levels) + data->type[0] = HWLOC_OBJ_SYSTEM; + else { + data->type[0] = HWLOC_OBJ_MACHINE; + nb_machine_levels++; + } + + if (cache_depth == 1) + /* if there is a single cache level, make it L2 */ + cache_depth = 2; + + for (i=0; itype[i]; + + if (type == HWLOC_OBJ_GROUP) + data->depth[i] = group_depth--; + else if (type == HWLOC_OBJ_CACHE) + data->depth[i] = cache_depth--; + } + + data->string = strdup(description); + data->arity[count-1] = 0; + + return 0; +} + +/* + * Recursively build objects whose cpu start at first_cpu + * - level gives where to look in the type, arity and id arrays + * - the id array is used as a variable to get unique IDs for a given level. + * - generated memory should be added to *memory_kB. + * - generated cpus should be added to parent_cpuset. + * - next cpu number to be used should be returned. + */ +static unsigned +hwloc__look_synthetic(struct hwloc_topology *topology, + struct hwloc_synthetic_backend_data_s *data, + int level, unsigned first_cpu, + hwloc_bitmap_t parent_cpuset) +{ + hwloc_obj_t obj; + unsigned i; + hwloc_obj_type_t type = data->type[level]; + + /* pre-hooks */ + switch (type) { + case HWLOC_OBJ_MISC: + break; + case HWLOC_OBJ_GROUP: + break; + case HWLOC_OBJ_SYSTEM: + case HWLOC_OBJ_BRIDGE: + case HWLOC_OBJ_PCI_DEVICE: + case HWLOC_OBJ_OS_DEVICE: + /* Shouldn't happen. */ + abort(); + break; + case HWLOC_OBJ_MACHINE: + break; + case HWLOC_OBJ_NODE: + break; + case HWLOC_OBJ_SOCKET: + break; + case HWLOC_OBJ_CACHE: + break; + case HWLOC_OBJ_CORE: + break; + case HWLOC_OBJ_PU: + break; + case HWLOC_OBJ_TYPE_MAX: + /* Should never happen */ + assert(0); + break; + } + + obj = hwloc_alloc_setup_object(type, data->id[level]++); + obj->cpuset = hwloc_bitmap_alloc(); + + if (!data->arity[level]) { + hwloc_bitmap_set(obj->cpuset, first_cpu++); + } else { + for (i = 0; i < data->arity[level]; i++) + first_cpu = hwloc__look_synthetic(topology, data, level + 1, first_cpu, obj->cpuset); + } + + if (type == HWLOC_OBJ_NODE) { + obj->nodeset = hwloc_bitmap_alloc(); + hwloc_bitmap_set(obj->nodeset, obj->os_index); + } + + hwloc_bitmap_or(parent_cpuset, parent_cpuset, obj->cpuset); + + /* post-hooks */ + switch (type) { + case HWLOC_OBJ_MISC: + break; + case HWLOC_OBJ_GROUP: + obj->attr->group.depth = data->depth[level]; + break; + case HWLOC_OBJ_SYSTEM: + case HWLOC_OBJ_BRIDGE: + case HWLOC_OBJ_PCI_DEVICE: + case HWLOC_OBJ_OS_DEVICE: + abort(); + break; + case HWLOC_OBJ_MACHINE: + break; + case HWLOC_OBJ_NODE: + /* 1GB in memory nodes, 256k 4k-pages. */ + obj->memory.local_memory = 1024*1024*1024; + obj->memory.page_types_len = 1; + obj->memory.page_types = malloc(sizeof(*obj->memory.page_types)); + memset(obj->memory.page_types, 0, sizeof(*obj->memory.page_types)); + obj->memory.page_types[0].size = 4096; + obj->memory.page_types[0].count = 256*1024; + break; + case HWLOC_OBJ_SOCKET: + break; + case HWLOC_OBJ_CACHE: + obj->attr->cache.depth = data->depth[level]; + obj->attr->cache.linesize = 64; + if (obj->attr->cache.depth == 1) { + /* 32Kb in L1d */ + obj->attr->cache.size = 32*1024; + obj->attr->cache.type = HWLOC_OBJ_CACHE_DATA; + } else { + /* *4 at each level, starting from 1MB for L2, unified */ + obj->attr->cache.size = 256*1024 << (2*obj->attr->cache.depth); + obj->attr->cache.type = HWLOC_OBJ_CACHE_UNIFIED; + } + break; + case HWLOC_OBJ_CORE: + break; + case HWLOC_OBJ_PU: + break; + case HWLOC_OBJ_TYPE_MAX: + /* Should never happen */ + assert(0); + break; + } + + hwloc_insert_object_by_cpuset(topology, obj); + + return first_cpu; +} + +static int +hwloc_look_synthetic(struct hwloc_backend *backend) +{ + struct hwloc_topology *topology = backend->topology; + struct hwloc_synthetic_backend_data_s *data = backend->private_data; + hwloc_bitmap_t cpuset = hwloc_bitmap_alloc(); + unsigned first_cpu = 0, i; + + assert(!topology->levels[0][0]->cpuset); + + hwloc_alloc_obj_cpusets(topology->levels[0][0]); + + topology->support.discovery->pu = 1; + + /* start with id=0 for each level */ + for (i = 0; data->arity[i] > 0; i++) + data->id[i] = 0; + /* ... including the last one */ + data->id[i] = 0; + + /* update first level type according to the synthetic type array */ + topology->levels[0][0]->type = data->type[0]; + + for (i = 0; i < data->arity[0]; i++) + first_cpu = hwloc__look_synthetic(topology, data, 1, first_cpu, cpuset); + + hwloc_bitmap_free(cpuset); + + hwloc_obj_add_info(topology->levels[0][0], "Backend", "Synthetic"); + hwloc_obj_add_info(topology->levels[0][0], "SyntheticDescription", data->string); + return 1; +} + +static void +hwloc_synthetic_backend_disable(struct hwloc_backend *backend) +{ + struct hwloc_synthetic_backend_data_s *data = backend->private_data; + free(data->string); + free(data); +} + +static struct hwloc_backend * +hwloc_synthetic_component_instantiate(struct hwloc_disc_component *component, + const void *_data1, + const void *_data2 __hwloc_attribute_unused, + const void *_data3 __hwloc_attribute_unused) +{ + struct hwloc_backend *backend; + struct hwloc_synthetic_backend_data_s *data; + int err; + + if (!_data1) { + errno = EINVAL; + goto out; + } + + backend = hwloc_backend_alloc(component); + if (!backend) + goto out; + + data = malloc(sizeof(*data)); + if (!data) { + errno = ENOMEM; + goto out_with_backend; + } + + err = hwloc_backend_synthetic_init(data, (const char *) _data1); + if (err < 0) + goto out_with_data; + + backend->private_data = data; + backend->discover = hwloc_look_synthetic; + backend->disable = hwloc_synthetic_backend_disable; + backend->is_thissystem = 0; + + return backend; + + out_with_data: + free(data); + out_with_backend: + free(backend); + out: + return NULL; +} + +static struct hwloc_disc_component hwloc_synthetic_disc_component = { + HWLOC_DISC_COMPONENT_TYPE_GLOBAL, + "synthetic", + ~0, + hwloc_synthetic_component_instantiate, + 30, + NULL +}; + +const struct hwloc_component hwloc_synthetic_component = { + HWLOC_COMPONENT_ABI, + HWLOC_COMPONENT_TYPE_DISC, + 0, + &hwloc_synthetic_disc_component +}; diff --git a/ext/hwloc/src/topology-x86.c b/ext/hwloc/src/topology-x86.c new file mode 100644 index 000000000..2f0db171f --- /dev/null +++ b/ext/hwloc/src/topology-x86.c @@ -0,0 +1,972 @@ +/* + * Copyright © 2010-2014 Inria. All rights reserved. + * Copyright © 2010-2013 Université Bordeaux 1 + * Copyright © 2010-2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + * + * + * This backend is only used when the operating system does not export + * the necessary hardware topology information to user-space applications. + * Currently, only the FreeBSD backend relies on this x86 backend. + * + * Other backends such as Linux have their own way to retrieve various + * pieces of hardware topology information from the operating system + * on various architectures, without having to use this x86-specific code. + */ + +#include +#include +#include +#include +#include + +#include + +#define has_topoext(features) ((features)[6] & (1 << 22)) + +struct cacheinfo { + unsigned type; + unsigned level; + unsigned nbthreads_sharing; + + unsigned linesize; + unsigned linepart; + int ways; + unsigned sets; + unsigned size; +}; + +struct procinfo { + unsigned present; + unsigned apicid; + unsigned max_log_proc; + unsigned max_nbcores; + unsigned max_nbthreads; + unsigned socketid; + unsigned nodeid; + unsigned unitid; + unsigned logprocid; + unsigned threadid; + unsigned coreid; + unsigned *otherids; + unsigned levels; + unsigned numcaches; + struct cacheinfo *cache; + char cpuvendor[13]; + char cpumodel[3*4*4+1]; + unsigned cpumodelnumber; + unsigned cpufamilynumber; +}; + +enum cpuid_type { + intel, + amd, + unknown +}; + +static void fill_amd_cache(struct procinfo *infos, unsigned level, unsigned cpuid) +{ + struct cacheinfo *cache; + unsigned cachenum; + unsigned size = 0; + + if (level == 1) + size = ((cpuid >> 24)) << 10; + else if (level == 2) + size = ((cpuid >> 16)) << 10; + else if (level == 3) + size = ((cpuid >> 18)) << 19; + if (!size) + return; + + cachenum = infos->numcaches++; + infos->cache = realloc(infos->cache, infos->numcaches*sizeof(*infos->cache)); + cache = &infos->cache[cachenum]; + + cache->type = 1; + cache->level = level; + if (level <= 2) + cache->nbthreads_sharing = 1; + else + cache->nbthreads_sharing = infos->max_log_proc; + cache->linesize = cpuid & 0xff; + cache->linepart = 0; + if (level == 1) { + cache->ways = (cpuid >> 16) & 0xff; + if (cache->ways == 0xff) + /* Fully associative */ + cache->ways = -1; + } else { + static const unsigned ways_tab[] = { 0, 1, 2, 0, 4, 0, 8, 0, 16, 0, 32, 48, 64, 96, 128, -1 }; + unsigned ways = (cpuid >> 12) & 0xf; + cache->ways = ways_tab[ways]; + } + cache->size = size; + cache->sets = 0; + + hwloc_debug("cache L%u t%u linesize %u ways %u size %uKB\n", cache->level, cache->nbthreads_sharing, cache->linesize, cache->ways, cache->size >> 10); +} + +/* Fetch information from the processor itself thanks to cpuid and store it in + * infos for summarize to analyze them globally */ +static void look_proc(struct procinfo *infos, unsigned highest_cpuid, unsigned highest_ext_cpuid, unsigned *features, enum cpuid_type cpuid_type) +{ + unsigned eax, ebx, ecx = 0, edx; + unsigned cachenum; + struct cacheinfo *cache; + unsigned regs[4]; + unsigned _model, _extendedmodel, _family, _extendedfamily; + + infos->present = 1; + + eax = 0x01; + hwloc_cpuid(&eax, &ebx, &ecx, &edx); + infos->apicid = ebx >> 24; + if (edx & (1 << 28)) + infos->max_log_proc = 1 << hwloc_flsl(((ebx >> 16) & 0xff) - 1); + else + infos->max_log_proc = 1; + hwloc_debug("APIC ID 0x%02x max_log_proc %u\n", infos->apicid, infos->max_log_proc); + infos->socketid = infos->apicid / infos->max_log_proc; + infos->logprocid = infos->apicid % infos->max_log_proc; + hwloc_debug("phys %u thread %u\n", infos->socketid, infos->logprocid); + + memset(regs, 0, sizeof(regs)); + regs[0] = 0; + hwloc_cpuid(®s[0], ®s[1], ®s[3], ®s[2]); + memcpy(infos->cpuvendor, regs+1, 4*3); + infos->cpuvendor[12] = '\0'; + + memset(regs, 0, sizeof(regs)); + regs[0] = 1; + hwloc_cpuid(®s[0], ®s[1], ®s[2], ®s[3]); + _model = (regs[0]>>4) & 0xf; + _extendedmodel = (regs[0]>>16) & 0xf; + _family = (regs[0]>>8) & 0xf; + _extendedfamily = (regs[0]>>20) & 0xff; + if (!strncmp(infos->cpuvendor, "Genu", 4) + || (!strncmp(infos->cpuvendor, "Auth", 4) && _family == 0xf)) { + infos->cpufamilynumber = _family + _extendedfamily; + infos->cpumodelnumber = _model + (_extendedmodel << 4); + } else { + infos->cpufamilynumber = _family; + infos->cpumodelnumber = _model; + } + + if (highest_ext_cpuid >= 0x80000004) { + memset(regs, 0, sizeof(regs)); + regs[0] = 0x80000002; + hwloc_cpuid(®s[0], ®s[1], ®s[2], ®s[3]); + memcpy(infos->cpumodel, regs, 4*4); + regs[0] = 0x80000003; + hwloc_cpuid(®s[0], ®s[1], ®s[2], ®s[3]); + memcpy(infos->cpumodel + 4*4, regs, 4*4); + regs[0] = 0x80000004; + hwloc_cpuid(®s[0], ®s[1], ®s[2], ®s[3]); + memcpy(infos->cpumodel + 4*4*2, regs, 4*4); + infos->cpumodel[3*4*4] = 0; + } else + infos->cpumodel[0] = 0; + + /* Intel doesn't actually provide 0x80000008 information */ + if (cpuid_type != intel && highest_ext_cpuid >= 0x80000008) { + unsigned coreidsize; + eax = 0x80000008; + hwloc_cpuid(&eax, &ebx, &ecx, &edx); + coreidsize = (ecx >> 12) & 0xf; + hwloc_debug("core ID size: %u\n", coreidsize); + if (!coreidsize) { + infos->max_nbcores = (ecx & 0xff) + 1; + } else + infos->max_nbcores = 1 << coreidsize; + hwloc_debug("Thus max # of cores: %u\n", infos->max_nbcores); + /* Still no multithreaded AMD */ + infos->max_nbthreads = 1 ; + hwloc_debug("and max # of threads: %u\n", infos->max_nbthreads); + infos->threadid = infos->logprocid % infos->max_nbthreads; + infos->coreid = infos->logprocid / infos->max_nbthreads; + hwloc_debug("this is thread %u of core %u\n", infos->threadid, infos->coreid); + } + + infos->numcaches = 0; + infos->cache = NULL; + + /* AMD topology extension */ + if (cpuid_type != intel && has_topoext(features)) { + unsigned apic_id, node_id, nodes_per_proc, unit_id, cores_per_unit; + + eax = 0x8000001e; + hwloc_cpuid(&eax, &ebx, &ecx, &edx); + infos->apicid = apic_id = eax; + infos->nodeid = node_id = ecx & 0xff; + nodes_per_proc = ((ecx >> 8) & 7) + 1; + if (nodes_per_proc > 2) { + hwloc_debug("warning: undefined value %d, assuming it means %d\n", nodes_per_proc, nodes_per_proc); + } + infos->unitid = unit_id = ebx & 0xff; + cores_per_unit = ((ebx >> 8) & 3) + 1; + hwloc_debug("x2APIC %08x, %d nodes, node %d, %d cores in unit %d\n", apic_id, nodes_per_proc, node_id, cores_per_unit, unit_id); + + for (cachenum = 0; ; cachenum++) { + unsigned type; + eax = 0x8000001d; + ecx = cachenum; + hwloc_cpuid(&eax, &ebx, &ecx, &edx); + type = eax & 0x1f; + if (type == 0) + break; + infos->numcaches++; + } + + cache = infos->cache = malloc(infos->numcaches * sizeof(*infos->cache)); + + for (cachenum = 0; ; cachenum++) { + unsigned linesize, linepart, ways, sets; + unsigned type; + eax = 0x8000001d; + ecx = cachenum; + hwloc_cpuid(&eax, &ebx, &ecx, &edx); + + type = eax & 0x1f; + + if (type == 0) + break; + + cache->type = type; + cache->level = (eax >> 5) & 0x7; + /* Note: actually number of cores */ + cache->nbthreads_sharing = ((eax >> 14) & 0xfff) + 1; + + cache->linesize = linesize = (ebx & 0xfff) + 1; + cache->linepart = linepart = ((ebx >> 12) & 0x3ff) + 1; + ways = ((ebx >> 22) & 0x3ff) + 1; + + if (eax & (1 << 9)) + /* Fully associative */ + cache->ways = -1; + else + cache->ways = ways; + cache->sets = sets = ecx + 1; + cache->size = linesize * linepart * ways * sets; + + hwloc_debug("cache %u type %u L%u t%u c%u linesize %u linepart %u ways %u sets %u, size %uKB\n", cachenum, cache->type, cache->level, cache->nbthreads_sharing, infos->max_nbcores, linesize, linepart, ways, sets, cache->size >> 10); + + cache++; + } + } else { + /* Intel doesn't actually provide 0x80000005 information */ + if (cpuid_type != intel && highest_ext_cpuid >= 0x80000005) { + eax = 0x80000005; + hwloc_cpuid(&eax, &ebx, &ecx, &edx); + fill_amd_cache(infos, 1, ecx); + } + + /* Intel doesn't actually provide 0x80000006 information */ + if (cpuid_type != intel && highest_ext_cpuid >= 0x80000006) { + eax = 0x80000006; + hwloc_cpuid(&eax, &ebx, &ecx, &edx); + fill_amd_cache(infos, 2, ecx); + fill_amd_cache(infos, 3, edx); + } + } + + /* AMD doesn't actually provide 0x04 information */ + if (cpuid_type != amd && highest_cpuid >= 0x04) { + for (cachenum = 0; ; cachenum++) { + unsigned type; + eax = 0x04; + ecx = cachenum; + hwloc_cpuid(&eax, &ebx, &ecx, &edx); + + type = eax & 0x1f; + + hwloc_debug("cache %u type %u\n", cachenum, type); + + if (type == 0) + break; + infos->numcaches++; + } + + cache = infos->cache = malloc(infos->numcaches * sizeof(*infos->cache)); + + for (cachenum = 0; ; cachenum++) { + unsigned linesize, linepart, ways, sets; + unsigned type; + eax = 0x04; + ecx = cachenum; + hwloc_cpuid(&eax, &ebx, &ecx, &edx); + + type = eax & 0x1f; + + if (type == 0) + break; + + cache->type = type; + cache->level = (eax >> 5) & 0x7; + cache->nbthreads_sharing = ((eax >> 14) & 0xfff) + 1; + infos->max_nbcores = ((eax >> 26) & 0x3f) + 1; + + cache->linesize = linesize = (ebx & 0xfff) + 1; + cache->linepart = linepart = ((ebx >> 12) & 0x3ff) + 1; + ways = ((ebx >> 22) & 0x3ff) + 1; + if (eax & (1 << 9)) + /* Fully associative */ + cache->ways = -1; + else + cache->ways = ways; + cache->sets = sets = ecx + 1; + cache->size = linesize * linepart * ways * sets; + + hwloc_debug("cache %u type %u L%u t%u c%u linesize %u linepart %u ways %u sets %u, size %uKB\n", cachenum, cache->type, cache->level, cache->nbthreads_sharing, infos->max_nbcores, linesize, linepart, ways, sets, cache->size >> 10); + infos->max_nbthreads = infos->max_log_proc / infos->max_nbcores; + hwloc_debug("thus %u threads\n", infos->max_nbthreads); + infos->threadid = infos->logprocid % infos->max_nbthreads; + infos->coreid = infos->logprocid / infos->max_nbthreads; + hwloc_debug("this is thread %u of core %u\n", infos->threadid, infos->coreid); + + cache++; + } + } + + if (cpuid_type == intel && highest_cpuid >= 0x0b) { + unsigned level, apic_nextshift, apic_number, apic_type, apic_id = 0, apic_shift = 0, id; + for (level = 0; ; level++) { + ecx = level; + eax = 0x0b; + hwloc_cpuid(&eax, &ebx, &ecx, &edx); + if (!eax && !ebx) + break; + } + if (level) { + infos->levels = level; + infos->otherids = malloc(level * sizeof(*infos->otherids)); + for (level = 0; ; level++) { + ecx = level; + eax = 0x0b; + hwloc_cpuid(&eax, &ebx, &ecx, &edx); + if (!eax && !ebx) + break; + apic_nextshift = eax & 0x1f; + apic_number = ebx & 0xffff; + apic_type = (ecx & 0xff00) >> 8; + apic_id = edx; + id = (apic_id >> apic_shift) & ((1 << (apic_nextshift - apic_shift)) - 1); + hwloc_debug("x2APIC %08x %d: nextshift %d num %2d type %d id %2d\n", apic_id, level, apic_nextshift, apic_number, apic_type, id); + infos->apicid = apic_id; + infos->otherids[level] = UINT_MAX; + switch (apic_type) { + case 1: + infos->threadid = id; + break; + case 2: + infos->coreid = id; + break; + default: + hwloc_debug("x2APIC %d: unknown type %d\n", level, apic_type); + infos->otherids[level] = apic_id >> apic_shift; + break; + } + apic_shift = apic_nextshift; + } + infos->socketid = apic_id >> apic_shift; + hwloc_debug("x2APIC remainder: %d\n", infos->socketid); + } else + infos->otherids = NULL; + } else + infos->otherids = NULL; +} + +static void +hwloc_x86_add_cpuinfos(hwloc_obj_t obj, struct procinfo *info, int nodup) +{ + char number[8]; + hwloc_obj_add_info_nodup(obj, "CPUVendor", info->cpuvendor, nodup); + if (info->cpumodel[0]) { + const char *c = info->cpumodel; + while (*c == ' ') + c++; + hwloc_obj_add_info_nodup(obj, "CPUModel", c, nodup); + } + snprintf(number, sizeof(number), "%u", info->cpumodelnumber); + hwloc_obj_add_info_nodup(obj, "CPUModelNumber", number, nodup); + snprintf(number, sizeof(number), "%u", info->cpufamilynumber); + hwloc_obj_add_info_nodup(obj, "CPUFamilyNumber", number, nodup); +} + +/* Analyse information stored in infos, and build/annotate topology levels accordingly */ +static void summarize(hwloc_topology_t topology, struct procinfo *infos, unsigned nbprocs, + int fulldiscovery) +{ + hwloc_bitmap_t complete_cpuset = hwloc_bitmap_alloc(); + unsigned i, j, l, level, type; + unsigned nbsockets = 0; + int one = -1; + + for (i = 0; i < nbprocs; i++) + if (infos[i].present) { + hwloc_bitmap_set(complete_cpuset, i); + one = i; + } + + if (one == -1) { + hwloc_bitmap_free(complete_cpuset); + return; + } + + /* Ideally, when fulldiscovery=0, we could add any object that doesn't exist yet. + * But what if the x86 and the native backends disagree because one is buggy? Which one to trust? + * Only annotate existing objects for now. + */ + + /* Look for sockets */ + if (fulldiscovery) { + hwloc_bitmap_t sockets_cpuset = hwloc_bitmap_dup(complete_cpuset); + hwloc_bitmap_t socket_cpuset; + hwloc_obj_t socket; + + while ((i = hwloc_bitmap_first(sockets_cpuset)) != (unsigned) -1) { + unsigned socketid = infos[i].socketid; + + socket_cpuset = hwloc_bitmap_alloc(); + for (j = i; j < nbprocs; j++) { + if (infos[j].socketid == socketid) { + hwloc_bitmap_set(socket_cpuset, j); + hwloc_bitmap_clr(sockets_cpuset, j); + } + } + socket = hwloc_alloc_setup_object(HWLOC_OBJ_SOCKET, socketid); + socket->cpuset = socket_cpuset; + + hwloc_x86_add_cpuinfos(socket, &infos[i], 0); + + hwloc_debug_1arg_bitmap("os socket %u has cpuset %s\n", + socketid, socket_cpuset); + hwloc_insert_object_by_cpuset(topology, socket); + nbsockets++; + } + hwloc_bitmap_free(sockets_cpuset); + + } else { + /* Annotate sockets previously-existing sockets */ + hwloc_obj_t socket = NULL; + int same = 1; + nbsockets = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_SOCKET); + /* check whether all sockets have the same info */ + for(i=1; ios_index == (unsigned) -1) { + /* try to fix the socket OS index if unknown. + * FIXME: ideally, we should check all bits in case x86 and the native backend disagree. + */ + for(i=0; icpuset, i)) { + socket->os_index = infos[i].socketid; + break; + } + } + } + for(i=0; ios_index || (same && socket->os_index == (unsigned) -1)) { + hwloc_x86_add_cpuinfos(socket, &infos[i], 1); + break; + } + } + } + } + /* If there was no socket, annotate the Machine instead */ + if ((!nbsockets) && infos[0].cpumodel[0]) { + hwloc_x86_add_cpuinfos(hwloc_get_root_obj(topology), &infos[0], 1); + } + + /* Look for Numa nodes inside sockets */ + if (fulldiscovery) { + hwloc_bitmap_t nodes_cpuset = hwloc_bitmap_dup(complete_cpuset); + hwloc_bitmap_t node_cpuset; + hwloc_obj_t node; + + while ((i = hwloc_bitmap_first(nodes_cpuset)) != (unsigned) -1) { + unsigned socketid = infos[i].socketid; + unsigned nodeid = infos[i].nodeid; + + if (nodeid == (unsigned)-1) { + hwloc_bitmap_clr(nodes_cpuset, i); + continue; + } + + node_cpuset = hwloc_bitmap_alloc(); + for (j = i; j < nbprocs; j++) { + if (infos[j].nodeid == (unsigned) -1) { + hwloc_bitmap_clr(nodes_cpuset, j); + continue; + } + + if (infos[j].socketid == socketid && infos[j].nodeid == nodeid) { + hwloc_bitmap_set(node_cpuset, j); + hwloc_bitmap_clr(nodes_cpuset, j); + } + } + node = hwloc_alloc_setup_object(HWLOC_OBJ_NODE, nodeid); + node->cpuset = node_cpuset; + hwloc_debug_1arg_bitmap("os node %u has cpuset %s\n", + nodeid, node_cpuset); + hwloc_insert_object_by_cpuset(topology, node); + } + hwloc_bitmap_free(nodes_cpuset); + } + + /* Look for Compute units inside sockets */ + if (fulldiscovery) { + hwloc_bitmap_t units_cpuset = hwloc_bitmap_dup(complete_cpuset); + hwloc_bitmap_t unit_cpuset; + hwloc_obj_t unit; + + while ((i = hwloc_bitmap_first(units_cpuset)) != (unsigned) -1) { + unsigned socketid = infos[i].socketid; + unsigned unitid = infos[i].unitid; + + if (unitid == (unsigned)-1) { + hwloc_bitmap_clr(units_cpuset, i); + continue; + } + + unit_cpuset = hwloc_bitmap_alloc(); + for (j = i; j < nbprocs; j++) { + if (infos[j].unitid == (unsigned) -1) { + hwloc_bitmap_clr(units_cpuset, j); + continue; + } + + if (infos[j].socketid == socketid && infos[j].unitid == unitid) { + hwloc_bitmap_set(unit_cpuset, j); + hwloc_bitmap_clr(units_cpuset, j); + } + } + unit = hwloc_alloc_setup_object(HWLOC_OBJ_GROUP, unitid); + unit->cpuset = unit_cpuset; + hwloc_debug_1arg_bitmap("os unit %u has cpuset %s\n", + unitid, unit_cpuset); + hwloc_insert_object_by_cpuset(topology, unit); + } + hwloc_bitmap_free(units_cpuset); + } + + /* Look for unknown objects */ + if (infos[one].otherids) { + for (level = infos[one].levels-1; level <= infos[one].levels-1; level--) { + if (infos[one].otherids[level] != UINT_MAX) { + hwloc_bitmap_t unknowns_cpuset = hwloc_bitmap_dup(complete_cpuset); + hwloc_bitmap_t unknown_cpuset; + hwloc_obj_t unknown_obj; + + while ((i = hwloc_bitmap_first(unknowns_cpuset)) != (unsigned) -1) { + unsigned unknownid = infos[i].otherids[level]; + + unknown_cpuset = hwloc_bitmap_alloc(); + for (j = i; j < nbprocs; j++) { + if (infos[j].otherids[level] == unknownid) { + hwloc_bitmap_set(unknown_cpuset, j); + hwloc_bitmap_clr(unknowns_cpuset, j); + } + } + unknown_obj = hwloc_alloc_setup_object(HWLOC_OBJ_MISC, unknownid); + unknown_obj->cpuset = unknown_cpuset; + unknown_obj->os_level = level; + hwloc_debug_2args_bitmap("os unknown%d %u has cpuset %s\n", + level, unknownid, unknown_cpuset); + hwloc_insert_object_by_cpuset(topology, unknown_obj); + } + hwloc_bitmap_free(unknowns_cpuset); + } + } + } + + /* Look for cores */ + if (fulldiscovery) { + hwloc_bitmap_t cores_cpuset = hwloc_bitmap_dup(complete_cpuset); + hwloc_bitmap_t core_cpuset; + hwloc_obj_t core; + + while ((i = hwloc_bitmap_first(cores_cpuset)) != (unsigned) -1) { + unsigned socketid = infos[i].socketid; + unsigned coreid = infos[i].coreid; + + if (coreid == (unsigned) -1) { + hwloc_bitmap_clr(cores_cpuset, i); + continue; + } + + core_cpuset = hwloc_bitmap_alloc(); + for (j = i; j < nbprocs; j++) { + if (infos[j].coreid == (unsigned) -1) { + hwloc_bitmap_clr(cores_cpuset, j); + continue; + } + + if (infos[j].socketid == socketid && infos[j].coreid == coreid) { + hwloc_bitmap_set(core_cpuset, j); + hwloc_bitmap_clr(cores_cpuset, j); + } + } + core = hwloc_alloc_setup_object(HWLOC_OBJ_CORE, coreid); + core->cpuset = core_cpuset; + hwloc_debug_1arg_bitmap("os core %u has cpuset %s\n", + coreid, core_cpuset); + hwloc_insert_object_by_cpuset(topology, core); + } + hwloc_bitmap_free(cores_cpuset); + } + + /* Look for caches */ + /* First find max level */ + level = 0; + for (i = 0; i < nbprocs; i++) + for (j = 0; j < infos[i].numcaches; j++) + if (infos[i].cache[j].level > level) + level = infos[i].cache[j].level; + + /* Look for known types */ + if (fulldiscovery) while (level > 0) { + for (type = 1; type <= 3; type++) { + /* Look for caches of that type at level level */ + { + hwloc_bitmap_t caches_cpuset = hwloc_bitmap_dup(complete_cpuset); + hwloc_bitmap_t cache_cpuset; + hwloc_obj_t cache; + + while ((i = hwloc_bitmap_first(caches_cpuset)) != (unsigned) -1) { + unsigned socketid = infos[i].socketid; + + for (l = 0; l < infos[i].numcaches; l++) { + if (infos[i].cache[l].level == level && infos[i].cache[l].type == type) + break; + } + if (l == infos[i].numcaches) { + /* no cache Llevel of that type in i */ + hwloc_bitmap_clr(caches_cpuset, i); + continue; + } + + /* Found a matching cache, now look for others sharing it */ + { + unsigned cacheid = infos[i].apicid / infos[i].cache[l].nbthreads_sharing; + + cache_cpuset = hwloc_bitmap_alloc(); + for (j = i; j < nbprocs; j++) { + unsigned l2; + for (l2 = 0; l2 < infos[j].numcaches; l2++) { + if (infos[j].cache[l2].level == level && infos[j].cache[l2].type == type) + break; + } + if (l2 == infos[j].numcaches) { + /* no cache Llevel of that type in j */ + hwloc_bitmap_clr(caches_cpuset, j); + continue; + } + if (infos[j].socketid == socketid && infos[j].apicid / infos[j].cache[l2].nbthreads_sharing == cacheid) { + hwloc_bitmap_set(cache_cpuset, j); + hwloc_bitmap_clr(caches_cpuset, j); + } + } + cache = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, cacheid); + cache->attr->cache.depth = level; + cache->attr->cache.size = infos[i].cache[l].size; + cache->attr->cache.linesize = infos[i].cache[l].linesize; + cache->attr->cache.associativity = infos[i].cache[l].ways; + switch (infos[i].cache[l].type) { + case 1: + cache->attr->cache.type = HWLOC_OBJ_CACHE_DATA; + break; + case 2: + cache->attr->cache.type = HWLOC_OBJ_CACHE_INSTRUCTION; + break; + case 3: + cache->attr->cache.type = HWLOC_OBJ_CACHE_UNIFIED; + break; + } + cache->cpuset = cache_cpuset; + hwloc_debug_2args_bitmap("os L%u cache %u has cpuset %s\n", + level, cacheid, cache_cpuset); + hwloc_insert_object_by_cpuset(topology, cache); + } + } + hwloc_bitmap_free(caches_cpuset); + } + } + level--; + } + + for (i = 0; i < nbprocs; i++) { + free(infos[i].cache); + if (infos[i].otherids) + free(infos[i].otherids); + } + + hwloc_bitmap_free(complete_cpuset); +} + +#if defined HWLOC_FREEBSD_SYS && defined HAVE_CPUSET_SETID +#include +#include +typedef cpusetid_t hwloc_x86_os_state_t; +static void hwloc_x86_os_state_save(hwloc_x86_os_state_t *state) +{ + /* temporary make all cpus available during discovery */ + cpuset_getid(CPU_LEVEL_CPUSET, CPU_WHICH_PID, -1, state); + cpuset_setid(CPU_WHICH_PID, -1, 0); +} +static void hwloc_x86_os_state_restore(hwloc_x86_os_state_t *state) +{ + /* restore initial cpuset */ + cpuset_setid(CPU_WHICH_PID, -1, *state); +} +#else /* !defined HWLOC_FREEBSD_SYS || !defined HAVE_CPUSET_SETID */ +typedef void * hwloc_x86_os_state_t; +static void hwloc_x86_os_state_save(hwloc_x86_os_state_t *state __hwloc_attribute_unused) { } +static void hwloc_x86_os_state_restore(hwloc_x86_os_state_t *state __hwloc_attribute_unused) { } +#endif /* !defined HWLOC_FREEBSD_SYS || !defined HAVE_CPUSET_SETID */ + + +#define INTEL_EBX ('G' | ('e'<<8) | ('n'<<16) | ('u'<<24)) +#define INTEL_EDX ('i' | ('n'<<8) | ('e'<<16) | ('I'<<24)) +#define INTEL_ECX ('n' | ('t'<<8) | ('e'<<16) | ('l'<<24)) + +#define AMD_EBX ('A' | ('u'<<8) | ('t'<<16) | ('h'<<24)) +#define AMD_EDX ('e' | ('n'<<8) | ('t'<<16) | ('i'<<24)) +#define AMD_ECX ('c' | ('A'<<8) | ('M'<<16) | ('D'<<24)) + +static +int hwloc_look_x86(struct hwloc_topology *topology, unsigned nbprocs, int fulldiscovery) +{ + unsigned eax, ebx, ecx = 0, edx; + hwloc_bitmap_t orig_cpuset; + unsigned i; + unsigned highest_cpuid; + unsigned highest_ext_cpuid; + /* This stores cpuid features with the same indexing as Linux */ + unsigned features[10] = { 0 }; + struct procinfo *infos = NULL; + enum cpuid_type cpuid_type = unknown; + hwloc_x86_os_state_t os_state; + struct hwloc_binding_hooks hooks; + struct hwloc_topology_support support; + struct hwloc_topology_membind_support memsupport __hwloc_attribute_unused; + int ret = -1; + + memset(&hooks, 0, sizeof(hooks)); + support.membind = &memsupport; + hwloc_set_native_binding_hooks(&hooks, &support); + if (nbprocs > 1 && + !(hooks.get_thisproc_cpubind && hooks.set_thisproc_cpubind) + && !(hooks.get_thisthread_cpubind && hooks.set_thisthread_cpubind)) + goto out; + + if (!hwloc_have_cpuid()) + goto out; + + infos = calloc(nbprocs, sizeof(struct procinfo)); + if (NULL == infos) + goto out; + for (i = 0; i < nbprocs; i++) { + infos[i].nodeid = (unsigned) -1; + infos[i].socketid = (unsigned) -1; + infos[i].unitid = (unsigned) -1; + infos[i].coreid = (unsigned) -1; + infos[i].threadid = (unsigned) -1; + } + + eax = 0x00; + hwloc_cpuid(&eax, &ebx, &ecx, &edx); + highest_cpuid = eax; + if (ebx == INTEL_EBX && ecx == INTEL_ECX && edx == INTEL_EDX) + cpuid_type = intel; + if (ebx == AMD_EBX && ecx == AMD_ECX && edx == AMD_EDX) + cpuid_type = amd; + + hwloc_debug("highest cpuid %x, cpuid type %u\n", highest_cpuid, cpuid_type); + if (highest_cpuid < 0x01) { + goto out_with_infos; + } + + eax = 0x01; + hwloc_cpuid(&eax, &ebx, &ecx, &edx); + features[0] = edx; + features[4] = ecx; + + eax = 0x80000000; + hwloc_cpuid(&eax, &ebx, &ecx, &edx); + highest_ext_cpuid = eax; + + hwloc_debug("highest extended cpuid %x\n", highest_ext_cpuid); + + if (highest_cpuid >= 0x7) { + eax = 0x7; + hwloc_cpuid(&eax, &ebx, &ecx, &edx); + features[9] = ebx; + } + + if (cpuid_type != intel && highest_ext_cpuid >= 0x80000001) { + eax = 0x80000001; + hwloc_cpuid(&eax, &ebx, &ecx, &edx); + features[1] = edx; + features[6] = ecx; + } + + hwloc_x86_os_state_save(&os_state); + + orig_cpuset = hwloc_bitmap_alloc(); + + if (hooks.get_thisthread_cpubind && hooks.set_thisthread_cpubind) { + if (!hooks.get_thisthread_cpubind(topology, orig_cpuset, HWLOC_CPUBIND_STRICT)) { + hwloc_bitmap_t set = hwloc_bitmap_alloc(); + for (i = 0; i < nbprocs; i++) { + hwloc_bitmap_only(set, i); + hwloc_debug("binding to CPU%d\n", i); + if (hooks.set_thisthread_cpubind(topology, set, HWLOC_CPUBIND_STRICT)) { + hwloc_debug("could not bind to CPU%d: %s\n", i, strerror(errno)); + continue; + } + look_proc(&infos[i], highest_cpuid, highest_ext_cpuid, features, cpuid_type); + } + hwloc_bitmap_free(set); + hooks.set_thisthread_cpubind(topology, orig_cpuset, 0); + hwloc_bitmap_free(orig_cpuset); + summarize(topology, infos, nbprocs, fulldiscovery); + ret = fulldiscovery; /* success, but objects added only if fulldiscovery */ + goto out_with_os_state; + } + } + + if (hooks.get_thisproc_cpubind && hooks.set_thisproc_cpubind) { + if (!hooks.get_thisproc_cpubind(topology, orig_cpuset, HWLOC_CPUBIND_STRICT)) { + hwloc_bitmap_t set = hwloc_bitmap_alloc(); + for (i = 0; i < nbprocs; i++) { + hwloc_bitmap_only(set, i); + hwloc_debug("binding to CPU%d\n", i); + if (hooks.set_thisproc_cpubind(topology, set, HWLOC_CPUBIND_STRICT)) { + hwloc_debug("could not bind to CPU%d: %s\n", i, strerror(errno)); + continue; + } + look_proc(&infos[i], highest_cpuid, highest_ext_cpuid, features, cpuid_type); + } + hwloc_bitmap_free(set); + hooks.set_thisproc_cpubind(topology, orig_cpuset, 0); + hwloc_bitmap_free(orig_cpuset); + summarize(topology, infos, nbprocs, fulldiscovery); + ret = fulldiscovery; /* success, but objects added only if fulldiscovery */ + goto out_with_os_state; + } + } + + if (nbprocs == 1) { + look_proc(&infos[0], highest_cpuid, highest_ext_cpuid, features, cpuid_type); + summarize(topology, infos, nbprocs, fulldiscovery); + ret = fulldiscovery; + } + + hwloc_bitmap_free(orig_cpuset); + +out_with_os_state: + hwloc_x86_os_state_restore(&os_state); + +out_with_infos: + if (NULL != infos) { + free(infos); + } + +out: + return ret; +} + +static int +hwloc_x86_discover(struct hwloc_backend *backend) +{ + struct hwloc_topology *topology = backend->topology; + unsigned nbprocs = hwloc_fallback_nbprocessors(topology); + int alreadypus = 0; + int ret; + + if (!topology->is_thissystem) { + hwloc_debug("%s", "\nno x86 detection (not thissystem)\n"); + return 0; + } + + if (topology->levels[0][0]->cpuset) { + /* somebody else discovered things */ + if (topology->nb_levels == 2 && topology->level_nbobjects[1] == nbprocs) { + /* only PUs were discovered, as much as we would, complete the topology with everything else */ + alreadypus = 1; + goto fulldiscovery; + } + + /* several object types were added, we can't easily complete, just annotate a bit */ + ret = hwloc_look_x86(topology, nbprocs, 0); + if (ret) + hwloc_obj_add_info(topology->levels[0][0], "Backend", "x86"); + return 0; + } else { + /* topology is empty, initialize it */ + hwloc_alloc_obj_cpusets(topology->levels[0][0]); + } + +fulldiscovery: + hwloc_look_x86(topology, nbprocs, 1); + /* if failed, just continue and create PUs */ + + if (!alreadypus) + hwloc_setup_pu_level(topology, nbprocs); + + hwloc_obj_add_info(topology->levels[0][0], "Backend", "x86"); + +#ifdef HAVE_UNAME + hwloc_add_uname_info(topology); /* we already know is_thissystem() is true */ +#else + /* uname isn't available, manually setup the "Architecture" info */ +#ifdef HWLOC_X86_64_ARCH + hwloc_obj_add_info(topology->levels[0][0], "Architecture", "x86_64"); +#else + hwloc_obj_add_info(topology->levels[0][0], "Architecture", "x86"); +#endif +#endif + return 1; +} + +static struct hwloc_backend * +hwloc_x86_component_instantiate(struct hwloc_disc_component *component, + const void *_data1 __hwloc_attribute_unused, + const void *_data2 __hwloc_attribute_unused, + const void *_data3 __hwloc_attribute_unused) +{ + struct hwloc_backend *backend; + backend = hwloc_backend_alloc(component); + if (!backend) + return NULL; + backend->flags = HWLOC_BACKEND_FLAG_NEED_LEVELS; + backend->discover = hwloc_x86_discover; + return backend; +} + +static struct hwloc_disc_component hwloc_x86_disc_component = { + HWLOC_DISC_COMPONENT_TYPE_CPU, + "x86", + HWLOC_DISC_COMPONENT_TYPE_GLOBAL, + hwloc_x86_component_instantiate, + 45, /* between native and no_os */ + NULL +}; + +const struct hwloc_component hwloc_x86_component = { + HWLOC_COMPONENT_ABI, + HWLOC_COMPONENT_TYPE_DISC, + 0, + &hwloc_x86_disc_component +}; diff --git a/ext/hwloc/src/topology.c b/ext/hwloc/src/topology.c new file mode 100644 index 000000000..f569dc8b5 --- /dev/null +++ b/ext/hwloc/src/topology.c @@ -0,0 +1,3115 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2014 Inria. All rights reserved. + * Copyright © 2009-2012 Université Bordeaux 1 + * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +#include + +#define _ATFILE_SOURCE +#include +#include +#ifdef HAVE_DIRENT_H +#include +#endif +#ifdef HAVE_UNISTD_H +#include +#endif +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#ifdef HAVE_MACH_MACH_INIT_H +#include +#endif +#ifdef HAVE_MACH_MACH_HOST_H +#include +#endif + +#ifdef HAVE_SYS_PARAM_H +#include +#endif + +#ifdef HAVE_SYS_SYSCTL_H +#include +#endif + +#ifdef HWLOC_WIN_SYS +#include +#endif + +unsigned hwloc_get_api_version(void) +{ + return HWLOC_API_VERSION; +} + +int hwloc_hide_errors(void) +{ + static int hide = 0; + static int checked = 0; + if (!checked) { + const char *envvar = getenv("HWLOC_HIDE_ERRORS"); + if (envvar) + hide = atoi(envvar); + checked = 1; + } + return hide; +} + +void hwloc_report_os_error(const char *msg, int line) +{ + static int reported = 0; + + if (!reported && !hwloc_hide_errors()) { + fprintf(stderr, "****************************************************************************\n"); + fprintf(stderr, "* hwloc has encountered what looks like an error from the operating system.\n"); + fprintf(stderr, "*\n"); + fprintf(stderr, "* %s\n", msg); + fprintf(stderr, "* Error occurred in topology.c line %d\n", line); + fprintf(stderr, "*\n"); + fprintf(stderr, "* Please report this error message to the hwloc user's mailing list,\n"); +#ifdef HWLOC_LINUX_SYS + fprintf(stderr, "* along with the output from the hwloc-gather-topology.sh script.\n"); +#else + fprintf(stderr, "* along with any relevant topology information from your platform.\n"); +#endif + fprintf(stderr, "****************************************************************************\n"); + reported = 1; + } +} + +#if defined(HAVE_SYSCTLBYNAME) +int hwloc_get_sysctlbyname(const char *name, int64_t *ret) +{ + union { + int32_t i32; + int64_t i64; + } n; + size_t size = sizeof(n); + if (sysctlbyname(name, &n, &size, NULL, 0)) + return -1; + switch (size) { + case sizeof(n.i32): + *ret = n.i32; + break; + case sizeof(n.i64): + *ret = n.i64; + break; + default: + return -1; + } + return 0; +} +#endif + +#if defined(HAVE_SYSCTL) +int hwloc_get_sysctl(int name[], unsigned namelen, int *ret) +{ + int n; + size_t size = sizeof(n); + if (sysctl(name, namelen, &n, &size, NULL, 0)) + return -1; + if (size != sizeof(n)) + return -1; + *ret = n; + return 0; +} +#endif + +/* Return the OS-provided number of processors. Unlike other methods such as + reading sysfs on Linux, this method is not virtualizable; thus it's only + used as a fall-back method, allowing `hwloc_set_fsroot ()' to + have the desired effect. */ +unsigned +hwloc_fallback_nbprocessors(struct hwloc_topology *topology) { + int n; +#if HAVE_DECL__SC_NPROCESSORS_ONLN + n = sysconf(_SC_NPROCESSORS_ONLN); +#elif HAVE_DECL__SC_NPROC_ONLN + n = sysconf(_SC_NPROC_ONLN); +#elif HAVE_DECL__SC_NPROCESSORS_CONF + n = sysconf(_SC_NPROCESSORS_CONF); +#elif HAVE_DECL__SC_NPROC_CONF + n = sysconf(_SC_NPROC_CONF); +#elif defined(HAVE_HOST_INFO) && HAVE_HOST_INFO + struct host_basic_info info; + mach_msg_type_number_t count = HOST_BASIC_INFO_COUNT; + host_info(mach_host_self(), HOST_BASIC_INFO, (integer_t*) &info, &count); + n = info.avail_cpus; +#elif defined(HAVE_SYSCTLBYNAME) + int64_t nn; + if (hwloc_get_sysctlbyname("hw.ncpu", &nn)) + nn = -1; + n = nn; +#elif defined(HAVE_SYSCTL) && HAVE_DECL_CTL_HW && HAVE_DECL_HW_NCPU + static int name[2] = {CTL_HW, HW_NPCU}; + if (hwloc_get_sysctl(name, sizeof(name)/sizeof(*name)), &n) + n = -1; +#elif defined(HWLOC_WIN_SYS) + SYSTEM_INFO sysinfo; + GetSystemInfo(&sysinfo); + n = sysinfo.dwNumberOfProcessors; +#else +#ifdef __GNUC__ +#warning No known way to discover number of available processors on this system +#warning hwloc_fallback_nbprocessors will default to 1 +#endif + n = -1; +#endif + if (n >= 1) + topology->support.discovery->pu = 1; + else + n = 1; + return n; +} + +/* + * Use the given number of processors and the optional online cpuset if given + * to set a PU level. + */ +void +hwloc_setup_pu_level(struct hwloc_topology *topology, + unsigned nb_pus) +{ + struct hwloc_obj *obj; + unsigned oscpu,cpu; + + hwloc_debug("%s", "\n\n * CPU cpusets *\n\n"); + for (cpu=0,oscpu=0; cpucpuset = hwloc_bitmap_alloc(); + hwloc_bitmap_only(obj->cpuset, oscpu); + + hwloc_debug_2args_bitmap("cpu %u (os %u) has cpuset %s\n", + cpu, oscpu, obj->cpuset); + hwloc_insert_object_by_cpuset(topology, obj); + + cpu++; + } +} + +static void +print_object(struct hwloc_topology *topology, int indent __hwloc_attribute_unused, hwloc_obj_t obj) +{ + char line[256], *cpuset = NULL; + hwloc_debug("%*s", 2*indent, ""); + hwloc_obj_snprintf(line, sizeof(line), topology, obj, "#", 1); + hwloc_debug("%s", line); + if (obj->name) + hwloc_debug(" name %s", obj->name); + if (obj->cpuset) { + hwloc_bitmap_asprintf(&cpuset, obj->cpuset); + hwloc_debug(" cpuset %s", cpuset); + free(cpuset); + } + if (obj->complete_cpuset) { + hwloc_bitmap_asprintf(&cpuset, obj->complete_cpuset); + hwloc_debug(" complete %s", cpuset); + free(cpuset); + } + if (obj->online_cpuset) { + hwloc_bitmap_asprintf(&cpuset, obj->online_cpuset); + hwloc_debug(" online %s", cpuset); + free(cpuset); + } + if (obj->allowed_cpuset) { + hwloc_bitmap_asprintf(&cpuset, obj->allowed_cpuset); + hwloc_debug(" allowed %s", cpuset); + free(cpuset); + } + if (obj->nodeset) { + hwloc_bitmap_asprintf(&cpuset, obj->nodeset); + hwloc_debug(" nodeset %s", cpuset); + free(cpuset); + } + if (obj->complete_nodeset) { + hwloc_bitmap_asprintf(&cpuset, obj->complete_nodeset); + hwloc_debug(" completeN %s", cpuset); + free(cpuset); + } + if (obj->allowed_nodeset) { + hwloc_bitmap_asprintf(&cpuset, obj->allowed_nodeset); + hwloc_debug(" allowedN %s", cpuset); + free(cpuset); + } + if (obj->arity) + hwloc_debug(" arity %u", obj->arity); + hwloc_debug("%s", "\n"); +} + +/* Just for debugging. */ +static void +print_objects(struct hwloc_topology *topology __hwloc_attribute_unused, int indent __hwloc_attribute_unused, hwloc_obj_t obj __hwloc_attribute_unused) +{ +#ifdef HWLOC_DEBUG + print_object(topology, indent, obj); + for (obj = obj->first_child; obj; obj = obj->next_sibling) + print_objects(topology, indent + 1, obj); +#endif +} + +void hwloc_obj_add_info(hwloc_obj_t obj, const char *name, const char *value) +{ +#define OBJECT_INFO_ALLOC 8 + /* nothing allocated initially, (re-)allocate by multiple of 8 */ + unsigned alloccount = (obj->infos_count + 1 + (OBJECT_INFO_ALLOC-1)) & ~(OBJECT_INFO_ALLOC-1); + if (obj->infos_count != alloccount) + obj->infos = realloc(obj->infos, alloccount*sizeof(*obj->infos)); + obj->infos[obj->infos_count].name = strdup(name); + obj->infos[obj->infos_count].value = strdup(value); + obj->infos_count++; +} + +void hwloc_obj_add_info_nodup(hwloc_obj_t obj, const char *name, const char *value, int nodup) +{ + if (nodup && hwloc_obj_get_info_by_name(obj, name)) + return; + hwloc_obj_add_info(obj, name, value); +} + +/* Free an object and all its content. */ +void +hwloc_free_unlinked_object(hwloc_obj_t obj) +{ + unsigned i; + switch (obj->type) { + default: + break; + } + for(i=0; iinfos_count; i++) { + free(obj->infos[i].name); + free(obj->infos[i].value); + } + free(obj->infos); + hwloc_clear_object_distances(obj); + free(obj->memory.page_types); + free(obj->attr); + free(obj->children); + free(obj->name); + hwloc_bitmap_free(obj->cpuset); + hwloc_bitmap_free(obj->complete_cpuset); + hwloc_bitmap_free(obj->online_cpuset); + hwloc_bitmap_free(obj->allowed_cpuset); + hwloc_bitmap_free(obj->nodeset); + hwloc_bitmap_free(obj->complete_nodeset); + hwloc_bitmap_free(obj->allowed_nodeset); + free(obj); +} + +static void +hwloc__duplicate_object(struct hwloc_obj *newobj, + struct hwloc_obj *src) +{ + size_t len; + unsigned i; + + newobj->type = src->type; + newobj->os_index = src->os_index; + + if (src->name) + newobj->name = strdup(src->name); + newobj->userdata = src->userdata; + + memcpy(&newobj->memory, &src->memory, sizeof(struct hwloc_obj_memory_s)); + if (src->memory.page_types_len) { + len = src->memory.page_types_len * sizeof(struct hwloc_obj_memory_page_type_s); + newobj->memory.page_types = malloc(len); + memcpy(newobj->memory.page_types, src->memory.page_types, len); + } + + memcpy(newobj->attr, src->attr, sizeof(*newobj->attr)); + + newobj->cpuset = hwloc_bitmap_dup(src->cpuset); + newobj->complete_cpuset = hwloc_bitmap_dup(src->complete_cpuset); + newobj->allowed_cpuset = hwloc_bitmap_dup(src->allowed_cpuset); + newobj->online_cpuset = hwloc_bitmap_dup(src->online_cpuset); + newobj->nodeset = hwloc_bitmap_dup(src->nodeset); + newobj->complete_nodeset = hwloc_bitmap_dup(src->complete_nodeset); + newobj->allowed_nodeset = hwloc_bitmap_dup(src->allowed_nodeset); + + /* don't duplicate distances, they'll be recreated at the end of the topology build */ + + for(i=0; iinfos_count; i++) + hwloc_obj_add_info(newobj, src->infos[i].name, src->infos[i].value); +} + +void +hwloc__duplicate_objects(struct hwloc_topology *newtopology, + struct hwloc_obj *newparent, + struct hwloc_obj *src) +{ + hwloc_obj_t newobj; + hwloc_obj_t child; + + newobj = hwloc_alloc_setup_object(src->type, src->os_index); + hwloc__duplicate_object(newobj, src); + + child = NULL; + while ((child = hwloc_get_next_child(newtopology, src, child)) != NULL) + hwloc__duplicate_objects(newtopology, newobj, child); + + hwloc_insert_object_by_parent(newtopology, newparent, newobj); +} + +int +hwloc_topology_dup(hwloc_topology_t *newp, + hwloc_topology_t old) +{ + hwloc_topology_t new; + hwloc_obj_t newroot; + hwloc_obj_t oldroot = hwloc_get_root_obj(old); + unsigned i; + + if (!old->is_loaded) { + errno = -EINVAL; + return -1; + } + + hwloc_topology_init(&new); + + new->flags = old->flags; + memcpy(new->ignored_types, old->ignored_types, sizeof(old->ignored_types)); + new->is_thissystem = old->is_thissystem; + new->is_loaded = 1; + new->pid = old->pid; + + memcpy(&new->binding_hooks, &old->binding_hooks, sizeof(old->binding_hooks)); + + memcpy(new->support.discovery, old->support.discovery, sizeof(*old->support.discovery)); + memcpy(new->support.cpubind, old->support.cpubind, sizeof(*old->support.cpubind)); + memcpy(new->support.membind, old->support.membind, sizeof(*old->support.membind)); + + new->userdata_export_cb = old->userdata_export_cb; + new->userdata_import_cb = old->userdata_import_cb; + + newroot = hwloc_get_root_obj(new); + hwloc__duplicate_object(newroot, oldroot); + for(i=0; iarity; i++) + hwloc__duplicate_objects(new, newroot, oldroot->children[i]); + + if (old->first_osdist) { + struct hwloc_os_distances_s *olddist = old->first_osdist; + while (olddist) { + struct hwloc_os_distances_s *newdist = malloc(sizeof(*newdist)); + newdist->type = olddist->type; + newdist->nbobjs = olddist->nbobjs; + newdist->indexes = malloc(newdist->nbobjs * sizeof(*newdist->indexes)); + memcpy(newdist->indexes, olddist->indexes, newdist->nbobjs * sizeof(*newdist->indexes)); + newdist->objs = NULL; /* will be recomputed when needed */ + newdist->distances = malloc(newdist->nbobjs * newdist->nbobjs * sizeof(*newdist->distances)); + memcpy(newdist->distances, olddist->distances, newdist->nbobjs * newdist->nbobjs * sizeof(*newdist->distances)); + + newdist->forced = olddist->forced; + if (new->first_osdist) { + new->last_osdist->next = newdist; + newdist->prev = new->last_osdist; + } else { + new->first_osdist = newdist; + newdist->prev = NULL; + } + new->last_osdist = newdist; + newdist->next = NULL; + + olddist = olddist->next; + } + } else + new->first_osdist = old->last_osdist = NULL; + + /* no need to duplicate backends, topology is already loaded */ + new->backends = NULL; + + hwloc_connect_children(new->levels[0][0]); + if (hwloc_connect_levels(new) < 0) + goto out; + + hwloc_distances_finalize_os(new); + hwloc_distances_finalize_logical(new); + +#ifndef HWLOC_DEBUG + if (getenv("HWLOC_DEBUG_CHECK")) +#endif + hwloc_topology_check(new); + + *newp = new; + return 0; + + out: + hwloc_topology_clear(new); + hwloc_distances_destroy(new); + hwloc_topology_setup_defaults(new); + return -1; +} + +/* + * How to compare objects based on types. + * + * Note that HIGHER/LOWER is only a (consistent) heuristic, used to sort + * objects with same cpuset consistently. + * Only EQUAL / not EQUAL can be relied upon. + */ + +enum hwloc_type_cmp_e { + HWLOC_TYPE_HIGHER, + HWLOC_TYPE_DEEPER, + HWLOC_TYPE_EQUAL +}; + +/* WARNING: The indexes of this array MUST match the ordering that of + the obj_order_type[] array, below. Specifically, the values must + be laid out such that: + + obj_order_type[obj_type_order[N]] = N + + for all HWLOC_OBJ_* values of N. Put differently: + + obj_type_order[A] = B + + where the A values are in order of the hwloc_obj_type_t enum, and + the B values are the corresponding indexes of obj_order_type. + + We can't use C99 syntax to initialize this in a little safer manner + -- bummer. :-( + + ************************************************************* + *** DO NOT CHANGE THE ORDERING OF THIS ARRAY WITHOUT TRIPLE + *** CHECKING ITS CORRECTNESS! + ************************************************************* + */ +static const unsigned obj_type_order[] = { + /* first entry is HWLOC_OBJ_SYSTEM */ 0, + /* next entry is HWLOC_OBJ_MACHINE */ 1, + /* next entry is HWLOC_OBJ_NODE */ 3, + /* next entry is HWLOC_OBJ_SOCKET */ 4, + /* next entry is HWLOC_OBJ_CACHE */ 5, + /* next entry is HWLOC_OBJ_CORE */ 6, + /* next entry is HWLOC_OBJ_PU */ 10, + /* next entry is HWLOC_OBJ_GROUP */ 2, + /* next entry is HWLOC_OBJ_MISC */ 11, + /* next entry is HWLOC_OBJ_BRIDGE */ 7, + /* next entry is HWLOC_OBJ_PCI_DEVICE */ 8, + /* next entry is HWLOC_OBJ_OS_DEVICE */ 9 +}; + +static const hwloc_obj_type_t obj_order_type[] = { + HWLOC_OBJ_SYSTEM, + HWLOC_OBJ_MACHINE, + HWLOC_OBJ_GROUP, + HWLOC_OBJ_NODE, + HWLOC_OBJ_SOCKET, + HWLOC_OBJ_CACHE, + HWLOC_OBJ_CORE, + HWLOC_OBJ_BRIDGE, + HWLOC_OBJ_PCI_DEVICE, + HWLOC_OBJ_OS_DEVICE, + HWLOC_OBJ_PU, + HWLOC_OBJ_MISC, +}; + +/* priority to be used when merging identical parent/children object + * (in merge_useless_child), keep the highest priority one. + * + * Always keep Machine/PU/PCIDev/OSDev + * then System/Node + * then Core + * then Socket + * then Cache + * then always drop Group/Misc/Bridge. + * + * Some type won't actually ever be involved in such merging. + */ +static const int obj_type_priority[] = { + /* first entry is HWLOC_OBJ_SYSTEM */ 80, + /* next entry is HWLOC_OBJ_MACHINE */ 100, + /* next entry is HWLOC_OBJ_NODE */ 80, + /* next entry is HWLOC_OBJ_SOCKET */ 40, + /* next entry is HWLOC_OBJ_CACHE */ 20, + /* next entry is HWLOC_OBJ_CORE */ 60, + /* next entry is HWLOC_OBJ_PU */ 100, + /* next entry is HWLOC_OBJ_GROUP */ 0, + /* next entry is HWLOC_OBJ_MISC */ 0, + /* next entry is HWLOC_OBJ_BRIDGE */ 0, + /* next entry is HWLOC_OBJ_PCI_DEVICE */ 100, + /* next entry is HWLOC_OBJ_OS_DEVICE */ 100 +}; + +static unsigned __hwloc_attribute_const +hwloc_get_type_order(hwloc_obj_type_t type) +{ + return obj_type_order[type]; +} + +#if !defined(NDEBUG) +static hwloc_obj_type_t hwloc_get_order_type(int order) +{ + return obj_order_type[order]; +} +#endif + +static int hwloc_obj_type_is_io (hwloc_obj_type_t type) +{ + return type == HWLOC_OBJ_BRIDGE || type == HWLOC_OBJ_PCI_DEVICE || type == HWLOC_OBJ_OS_DEVICE; +} + +int hwloc_compare_types (hwloc_obj_type_t type1, hwloc_obj_type_t type2) +{ + unsigned order1 = hwloc_get_type_order(type1); + unsigned order2 = hwloc_get_type_order(type2); + + /* bridge and devices are only comparable with each others and with machine and system */ + if (hwloc_obj_type_is_io(type1) + && !hwloc_obj_type_is_io(type2) && type2 != HWLOC_OBJ_SYSTEM && type2 != HWLOC_OBJ_MACHINE) + return HWLOC_TYPE_UNORDERED; + if (hwloc_obj_type_is_io(type2) + && !hwloc_obj_type_is_io(type1) && type1 != HWLOC_OBJ_SYSTEM && type1 != HWLOC_OBJ_MACHINE) + return HWLOC_TYPE_UNORDERED; + + return order1 - order2; +} + +static enum hwloc_type_cmp_e +hwloc_type_cmp(hwloc_obj_t obj1, hwloc_obj_t obj2) +{ + hwloc_obj_type_t type1 = obj1->type; + hwloc_obj_type_t type2 = obj2->type; + int compare; + + compare = hwloc_compare_types(type1, type2); + if (compare == HWLOC_TYPE_UNORDERED) + return HWLOC_TYPE_EQUAL; /* we cannot do better */ + if (compare > 0) + return HWLOC_TYPE_DEEPER; + if (compare < 0) + return HWLOC_TYPE_HIGHER; + + /* Caches have the same types but can have different depths. */ + if (type1 == HWLOC_OBJ_CACHE) { + if (obj1->attr->cache.depth < obj2->attr->cache.depth) + return HWLOC_TYPE_DEEPER; + else if (obj1->attr->cache.depth > obj2->attr->cache.depth) + return HWLOC_TYPE_HIGHER; + else if (obj1->attr->cache.type > obj2->attr->cache.type) + /* consider icache deeper than dcache and dcache deeper than unified */ + return HWLOC_TYPE_DEEPER; + else if (obj1->attr->cache.type < obj2->attr->cache.type) + /* consider icache deeper than dcache and dcache deeper than unified */ + return HWLOC_TYPE_HIGHER; + } + + /* Group objects have the same types but can have different depths. */ + if (type1 == HWLOC_OBJ_GROUP) { + if (obj1->attr->group.depth == (unsigned) -1 + || obj2->attr->group.depth == (unsigned) -1) + return HWLOC_TYPE_EQUAL; + if (obj1->attr->group.depth < obj2->attr->group.depth) + return HWLOC_TYPE_DEEPER; + else if (obj1->attr->group.depth > obj2->attr->group.depth) + return HWLOC_TYPE_HIGHER; + } + + /* Bridges objects have the same types but can have different depths. */ + if (type1 == HWLOC_OBJ_BRIDGE) { + if (obj1->attr->bridge.depth < obj2->attr->bridge.depth) + return HWLOC_TYPE_DEEPER; + else if (obj1->attr->bridge.depth > obj2->attr->bridge.depth) + return HWLOC_TYPE_HIGHER; + } + + return HWLOC_TYPE_EQUAL; +} + +/* + * How to compare objects based on cpusets. + */ + +enum hwloc_obj_cmp_e { + HWLOC_OBJ_EQUAL, /**< \brief Equal */ + HWLOC_OBJ_INCLUDED, /**< \brief Strictly included into */ + HWLOC_OBJ_CONTAINS, /**< \brief Strictly contains */ + HWLOC_OBJ_INTERSECTS, /**< \brief Intersects, but no inclusion! */ + HWLOC_OBJ_DIFFERENT /**< \brief No intersection */ +}; + +static int +hwloc_obj_cmp(hwloc_obj_t obj1, hwloc_obj_t obj2) +{ + hwloc_bitmap_t set1, set2; + + /* compare cpusets if possible, or fallback to nodeset, or return */ + if (obj1->cpuset && !hwloc_bitmap_iszero(obj1->cpuset) + && obj2->cpuset && !hwloc_bitmap_iszero(obj2->cpuset)) { + set1 = obj1->cpuset; + set2 = obj2->cpuset; + } else if (obj1->nodeset && !hwloc_bitmap_iszero(obj1->nodeset) + && obj2->nodeset && !hwloc_bitmap_iszero(obj2->nodeset)) { + set1 = obj1->nodeset; + set2 = obj2->nodeset; + } else { + return HWLOC_OBJ_DIFFERENT; + } + + if (hwloc_bitmap_isequal(set1, set2)) { + + /* Same sets, subsort by type to have a consistent ordering. */ + + switch (hwloc_type_cmp(obj1, obj2)) { + case HWLOC_TYPE_DEEPER: + return HWLOC_OBJ_INCLUDED; + case HWLOC_TYPE_HIGHER: + return HWLOC_OBJ_CONTAINS; + + case HWLOC_TYPE_EQUAL: + if (obj1->type == HWLOC_OBJ_MISC) { + /* Misc objects may vary by name */ + int res = strcmp(obj1->name, obj2->name); + if (res < 0) + return HWLOC_OBJ_INCLUDED; + if (res > 0) + return HWLOC_OBJ_CONTAINS; + if (res == 0) + return HWLOC_OBJ_EQUAL; + } + + /* Same sets and types! Let's hope it's coherent. */ + return HWLOC_OBJ_EQUAL; + } + + /* For dumb compilers */ + abort(); + + } else { + + /* Different sets, sort by inclusion. */ + + if (hwloc_bitmap_isincluded(set1, set2)) + return HWLOC_OBJ_INCLUDED; + + if (hwloc_bitmap_isincluded(set2, set1)) + return HWLOC_OBJ_CONTAINS; + + if (hwloc_bitmap_intersects(set1, set2)) + return HWLOC_OBJ_INTERSECTS; + + return HWLOC_OBJ_DIFFERENT; + } +} + +/* format must contain a single %s where to print obj infos */ +static void +hwloc___insert_object_by_cpuset_report_error(hwloc_report_error_t report_error, const char *fmt, hwloc_obj_t obj, int line) +{ + char typestr[64]; + char objstr[512]; + char msg[640]; + char *cpusetstr; + + hwloc_obj_type_snprintf(typestr, sizeof(typestr), obj, 0); + hwloc_bitmap_asprintf(&cpusetstr, obj->cpuset); + if (obj->os_index != (unsigned) -1) + snprintf(objstr, sizeof(objstr), "%s P#%u cpuset %s", + typestr, obj->os_index, cpusetstr); + else + snprintf(objstr, sizeof(objstr), "%s cpuset %s", + typestr, cpusetstr); + free(cpusetstr); + + snprintf(msg, sizeof(msg), fmt, + objstr); + report_error(msg, line); +} + +/* + * How to insert objects into the topology. + * + * Note: during detection, only the first_child and next_sibling pointers are + * kept up to date. Others are computed only once topology detection is + * complete. + */ + +#define merge_index(new, old, field, type) \ + if ((old)->field == (type) -1) \ + (old)->field = (new)->field; +#define merge_sizes(new, old, field) \ + if (!(old)->field) \ + (old)->field = (new)->field; +#ifdef HWLOC_DEBUG +#define check_sizes(new, old, field) \ + if ((new)->field) \ + assert((old)->field == (new)->field) +#else +#define check_sizes(new, old, field) +#endif + +/* Try to insert OBJ in CUR, recurse if needed. + * Returns the object if it was inserted, + * the remaining object it was merged, + * NULL if failed to insert. + */ +static struct hwloc_obj * +hwloc___insert_object_by_cpuset(struct hwloc_topology *topology, hwloc_obj_t cur, hwloc_obj_t obj, + hwloc_report_error_t report_error) +{ + hwloc_obj_t child, container, *cur_children, *obj_children, next_child = NULL; + int put; + + /* Make sure we haven't gone too deep. */ + if (!hwloc_bitmap_isincluded(obj->cpuset, cur->cpuset)) { + fprintf(stderr,"recursion has gone too deep?!\n"); + return NULL; + } + + /* Check whether OBJ is included in some child. */ + container = NULL; + for (child = cur->first_child; child; child = child->next_sibling) { + switch (hwloc_obj_cmp(obj, child)) { + case HWLOC_OBJ_EQUAL: + merge_index(obj, child, os_level, signed); + if (obj->os_level != child->os_level) { + static int reported = 0; + if (!reported && !hwloc_hide_errors()) { + fprintf(stderr, "Cannot merge similar %s objects with different OS levels %u and %u\n", + hwloc_obj_type_string(obj->type), child->os_level, obj->os_level); + reported = 1; + } + return NULL; + } + merge_index(obj, child, os_index, unsigned); + if (obj->os_index != child->os_index) { + static int reported = 0; + if (!reported && !hwloc_hide_errors()) { + fprintf(stderr, "Cannot merge similar %s objects with different OS indexes %u and %u\n", + hwloc_obj_type_string(obj->type), child->os_index, obj->os_index); + reported = 1; + } + return NULL; + } + if (obj->distances_count) { + if (child->distances_count) { + child->distances_count += obj->distances_count; + child->distances = realloc(child->distances, child->distances_count * sizeof(*child->distances)); + memcpy(child->distances + obj->distances_count, obj->distances, obj->distances_count * sizeof(*child->distances)); + free(obj->distances); + } else { + child->distances_count = obj->distances_count; + child->distances = obj->distances; + } + obj->distances_count = 0; + obj->distances = NULL; + } + if (obj->infos_count) { + if (child->infos_count) { + child->infos_count += obj->infos_count; + child->infos = realloc(child->infos, child->infos_count * sizeof(*child->infos)); + memcpy(child->infos + obj->infos_count, obj->infos, obj->infos_count * sizeof(*child->infos)); + free(obj->infos); + } else { + child->infos_count = obj->infos_count; + child->infos = obj->infos; + } + obj->infos_count = 0; + obj->infos = NULL; + } + if (obj->name) { + if (child->name) + free(child->name); + child->name = obj->name; + obj->name = NULL; + } + assert(!obj->userdata); /* user could not set userdata here (we're before load() */ + switch(obj->type) { + case HWLOC_OBJ_NODE: + /* Do not check these, it may change between calls */ + merge_sizes(obj, child, memory.local_memory); + merge_sizes(obj, child, memory.total_memory); + /* if both objects have a page_types array, just keep the biggest one for now */ + if (obj->memory.page_types_len && child->memory.page_types_len) + hwloc_debug("%s", "merging page_types by keeping the biggest one only\n"); + if (obj->memory.page_types_len < child->memory.page_types_len) { + free(obj->memory.page_types); + } else { + free(child->memory.page_types); + child->memory.page_types_len = obj->memory.page_types_len; + child->memory.page_types = obj->memory.page_types; + obj->memory.page_types = NULL; + obj->memory.page_types_len = 0; + } + break; + case HWLOC_OBJ_CACHE: + merge_sizes(obj, child, attr->cache.size); + check_sizes(obj, child, attr->cache.size); + merge_sizes(obj, child, attr->cache.linesize); + check_sizes(obj, child, attr->cache.linesize); + break; + default: + break; + } + /* Already present, no need to insert. */ + return child; + case HWLOC_OBJ_INCLUDED: + if (container) { + if (report_error) + hwloc___insert_object_by_cpuset_report_error(report_error, "object (%s) included in several different objects!", obj, __LINE__); + /* We can't handle that. */ + return NULL; + } + /* This child contains OBJ. */ + container = child; + break; + case HWLOC_OBJ_INTERSECTS: + if (report_error) + hwloc___insert_object_by_cpuset_report_error(report_error, "object (%s) intersection without inclusion!", obj, __LINE__); + /* We can't handle that. */ + return NULL; + case HWLOC_OBJ_CONTAINS: + /* OBJ will be above CHILD. */ + break; + case HWLOC_OBJ_DIFFERENT: + /* OBJ will be alongside CHILD. */ + break; + } + } + + if (container) { + /* OBJ is strictly contained is some child of CUR, go deeper. */ + return hwloc___insert_object_by_cpuset(topology, container, obj, report_error); + } + + /* + * Children of CUR are either completely different from or contained into + * OBJ. Take those that are contained (keeping sorting order), and sort OBJ + * along those that are different. + */ + + /* OBJ is not put yet. */ + put = 0; + + /* These will always point to the pointer to their next last child. */ + cur_children = &cur->first_child; + obj_children = &obj->first_child; + + /* Construct CUR's and OBJ's children list. */ + + /* Iteration with prefetching to be completely safe against CHILD removal. */ + for (child = cur->first_child, child ? next_child = child->next_sibling : NULL; + child; + child = next_child, child ? next_child = child->next_sibling : NULL) { + + switch (hwloc_obj_cmp(obj, child)) { + + case HWLOC_OBJ_DIFFERENT: + /* Leave CHILD in CUR. */ + if (!put && (!child->cpuset || hwloc_bitmap_compare_first(obj->cpuset, child->cpuset) < 0)) { + /* Sort children by cpuset: put OBJ before CHILD in CUR's children. */ + *cur_children = obj; + cur_children = &obj->next_sibling; + put = 1; + } + /* Now put CHILD in CUR's children. */ + *cur_children = child; + cur_children = &child->next_sibling; + break; + + case HWLOC_OBJ_CONTAINS: + /* OBJ contains CHILD, put the latter in the former. */ + *obj_children = child; + obj_children = &child->next_sibling; + break; + + case HWLOC_OBJ_EQUAL: + case HWLOC_OBJ_INCLUDED: + case HWLOC_OBJ_INTERSECTS: + /* Shouldn't ever happen as we have handled them above. */ + abort(); + } + } + + /* Put OBJ last in CUR's children if not already done so. */ + if (!put) { + *cur_children = obj; + cur_children = &obj->next_sibling; + } + + /* Close children lists. */ + *obj_children = NULL; + *cur_children = NULL; + + return obj; +} + +/* insertion routine that lets you change the error reporting callback */ +struct hwloc_obj * +hwloc__insert_object_by_cpuset(struct hwloc_topology *topology, hwloc_obj_t obj, + hwloc_report_error_t report_error) +{ + struct hwloc_obj *result; + /* Start at the top. */ + /* Add the cpuset to the top */ + hwloc_bitmap_or(topology->levels[0][0]->complete_cpuset, topology->levels[0][0]->complete_cpuset, obj->cpuset); + if (obj->nodeset) + hwloc_bitmap_or(topology->levels[0][0]->complete_nodeset, topology->levels[0][0]->complete_nodeset, obj->nodeset); + result = hwloc___insert_object_by_cpuset(topology, topology->levels[0][0], obj, report_error); + if (result != obj) + /* either failed to insert, or got merged, free the original object */ + hwloc_free_unlinked_object(obj); + return result; +} + +/* the default insertion routine warns in case of error. + * it's used by most backends */ +struct hwloc_obj * +hwloc_insert_object_by_cpuset(struct hwloc_topology *topology, hwloc_obj_t obj) +{ + return hwloc__insert_object_by_cpuset(topology, obj, hwloc_report_os_error); +} + +void +hwloc_insert_object_by_parent(struct hwloc_topology *topology, hwloc_obj_t parent, hwloc_obj_t obj) +{ + hwloc_obj_t child, next_child = obj->first_child; + hwloc_obj_t *current; + + /* Append to the end of the list */ + for (current = &parent->first_child; *current; current = &(*current)->next_sibling) { + hwloc_bitmap_t curcpuset = (*current)->cpuset; + if (obj->cpuset && (!curcpuset || hwloc_bitmap_compare_first(obj->cpuset, curcpuset) < 0)) { + static int reported = 0; + if (!reported && !hwloc_hide_errors()) { + char *a = "NULL", *b; + if (curcpuset) + hwloc_bitmap_asprintf(&a, curcpuset); + hwloc_bitmap_asprintf(&b, obj->cpuset); + fprintf(stderr, "****************************************************************************\n"); + fprintf(stderr, "* hwloc has encountered an out-of-order topology discovery.\n"); + fprintf(stderr, "* An object with cpuset %s was inserted after object with %s\n", b, a); + fprintf(stderr, "* Please check that your input topology (XML file, etc.) is valid.\n"); + fprintf(stderr, "****************************************************************************\n"); + if (curcpuset) + free(a); + free(b); + reported = 1; + } + } + } + + *current = obj; + obj->next_sibling = NULL; + obj->first_child = NULL; + + /* Use the new object to insert children */ + parent = obj; + + /* Recursively insert children below */ + while (next_child) { + child = next_child; + next_child = child->next_sibling; + hwloc_insert_object_by_parent(topology, parent, child); + } + + if (obj->type == HWLOC_OBJ_MISC) { + /* misc objects go in no level (needed here because level building doesn't see Misc objects inside I/O trees) */ + obj->depth = (unsigned) HWLOC_TYPE_DEPTH_UNKNOWN; + } +} + +/* Adds a misc object _after_ detection, and thus has to reconnect all the pointers */ +hwloc_obj_t +hwloc_topology_insert_misc_object_by_cpuset(struct hwloc_topology *topology, hwloc_const_bitmap_t cpuset, const char *name) +{ + hwloc_obj_t obj, child; + + if (!topology->is_loaded) { + errno = EINVAL; + return NULL; + } + + if (hwloc_bitmap_iszero(cpuset)) + return NULL; + if (!hwloc_bitmap_isincluded(cpuset, hwloc_topology_get_topology_cpuset(topology))) + return NULL; + + obj = hwloc_alloc_setup_object(HWLOC_OBJ_MISC, -1); + if (name) + obj->name = strdup(name); + + /* misc objects go in no level */ + obj->depth = (unsigned) HWLOC_TYPE_DEPTH_UNKNOWN; + + obj->cpuset = hwloc_bitmap_dup(cpuset); + /* initialize default cpusets, we'll adjust them later */ + obj->complete_cpuset = hwloc_bitmap_dup(cpuset); + obj->allowed_cpuset = hwloc_bitmap_dup(cpuset); + obj->online_cpuset = hwloc_bitmap_dup(cpuset); + + obj = hwloc__insert_object_by_cpuset(topology, obj, NULL /* do not show errors on stdout */); + if (!obj) + return NULL; + + hwloc_connect_children(topology->levels[0][0]); + + if ((child = obj->first_child) != NULL && child->cpuset) { + /* keep the main cpuset untouched, but update other cpusets and nodesets from children */ + obj->nodeset = hwloc_bitmap_alloc(); + obj->complete_nodeset = hwloc_bitmap_alloc(); + obj->allowed_nodeset = hwloc_bitmap_alloc(); + while (child) { + if (child->complete_cpuset) + hwloc_bitmap_or(obj->complete_cpuset, obj->complete_cpuset, child->complete_cpuset); + if (child->allowed_cpuset) + hwloc_bitmap_or(obj->allowed_cpuset, obj->allowed_cpuset, child->allowed_cpuset); + if (child->online_cpuset) + hwloc_bitmap_or(obj->online_cpuset, obj->online_cpuset, child->online_cpuset); + if (child->nodeset) + hwloc_bitmap_or(obj->nodeset, obj->nodeset, child->nodeset); + if (child->complete_nodeset) + hwloc_bitmap_or(obj->complete_nodeset, obj->complete_nodeset, child->complete_nodeset); + if (child->allowed_nodeset) + hwloc_bitmap_or(obj->allowed_nodeset, obj->allowed_nodeset, child->allowed_nodeset); + child = child->next_sibling; + } + } else { + /* copy the parent nodesets */ + obj->nodeset = hwloc_bitmap_dup(obj->parent->nodeset); + obj->complete_nodeset = hwloc_bitmap_dup(obj->parent->complete_nodeset); + obj->allowed_nodeset = hwloc_bitmap_dup(obj->parent->allowed_nodeset); + } + + return obj; +} + +hwloc_obj_t +hwloc_topology_insert_misc_object_by_parent(struct hwloc_topology *topology, hwloc_obj_t parent, const char *name) +{ + hwloc_obj_t obj = hwloc_alloc_setup_object(HWLOC_OBJ_MISC, -1); + if (name) + obj->name = strdup(name); + + if (!topology->is_loaded) { + hwloc_free_unlinked_object(obj); + errno = EINVAL; + return NULL; + } + + hwloc_insert_object_by_parent(topology, parent, obj); + + hwloc_connect_children(topology->levels[0][0]); + /* no need to hwloc_connect_levels() since misc object are not in levels */ + + return obj; +} + +/* Traverse children of a parent in a safe way: reread the next pointer as + * appropriate to prevent crash on child deletion: */ +#define for_each_child_safe(child, parent, pchild) \ + for (pchild = &(parent)->first_child, child = *pchild; \ + child; \ + /* Check whether the current child was not dropped. */ \ + (*pchild == child ? pchild = &(child->next_sibling) : NULL), \ + /* Get pointer to next childect. */ \ + child = *pchild) + +/* Append I/O devices below this object to their list */ +static void +append_iodevs(hwloc_topology_t topology, hwloc_obj_t obj) +{ + hwloc_obj_t child, *temp; + + /* make sure we don't have remaining stale pointers from a previous load */ + obj->next_cousin = NULL; + obj->prev_cousin = NULL; + + if (obj->type == HWLOC_OBJ_BRIDGE) { + obj->depth = HWLOC_TYPE_DEPTH_BRIDGE; + /* Insert in the main bridge list */ + if (topology->first_bridge) { + obj->prev_cousin = topology->last_bridge; + obj->prev_cousin->next_cousin = obj; + topology->last_bridge = obj; + } else { + topology->first_bridge = topology->last_bridge = obj; + } + } else if (obj->type == HWLOC_OBJ_PCI_DEVICE) { + obj->depth = HWLOC_TYPE_DEPTH_PCI_DEVICE; + /* Insert in the main pcidev list */ + if (topology->first_pcidev) { + obj->prev_cousin = topology->last_pcidev; + obj->prev_cousin->next_cousin = obj; + topology->last_pcidev = obj; + } else { + topology->first_pcidev = topology->last_pcidev = obj; + } + } else if (obj->type == HWLOC_OBJ_OS_DEVICE) { + obj->depth = HWLOC_TYPE_DEPTH_OS_DEVICE; + /* Insert in the main osdev list */ + if (topology->first_osdev) { + obj->prev_cousin = topology->last_osdev; + obj->prev_cousin->next_cousin = obj; + topology->last_osdev = obj; + } else { + topology->first_osdev = topology->last_osdev = obj; + } + } + + for_each_child_safe(child, obj, temp) + append_iodevs(topology, child); +} + +static int hwloc_memory_page_type_compare(const void *_a, const void *_b) +{ + const struct hwloc_obj_memory_page_type_s *a = _a; + const struct hwloc_obj_memory_page_type_s *b = _b; + /* consider 0 as larger so that 0-size page_type go to the end */ + if (!b->size) + return -1; + /* don't cast a-b in int since those are ullongs */ + if (b->size == a->size) + return 0; + return a->size < b->size ? -1 : 1; +} + +/* Propagate memory counts */ +static void +propagate_total_memory(hwloc_obj_t obj) +{ + hwloc_obj_t *temp, child; + unsigned i; + + /* reset total before counting local and children memory */ + obj->memory.total_memory = 0; + + /* Propagate memory up */ + for_each_child_safe(child, obj, temp) { + propagate_total_memory(child); + obj->memory.total_memory += child->memory.total_memory; + } + obj->memory.total_memory += obj->memory.local_memory; + + /* By the way, sort the page_type array. + * Cannot do it on insert since some backends (e.g. XML) add page_types after inserting the object. + */ + qsort(obj->memory.page_types, obj->memory.page_types_len, sizeof(*obj->memory.page_types), hwloc_memory_page_type_compare); + /* Ignore 0-size page_types, they are at the end */ + for(i=obj->memory.page_types_len; i>=1; i--) + if (obj->memory.page_types[i-1].size) + break; + obj->memory.page_types_len = i; +} + +/* Collect the cpuset of all the PU objects. */ +static void +collect_proc_cpuset(hwloc_obj_t obj, hwloc_obj_t sys) +{ + hwloc_obj_t child, *temp; + + if (sys) { + /* We are already given a pointer to a system object */ + if (obj->type == HWLOC_OBJ_PU) + hwloc_bitmap_or(sys->cpuset, sys->cpuset, obj->cpuset); + } else { + if (obj->cpuset) { + /* This object is the root of a machine */ + sys = obj; + /* Assume no PU for now */ + hwloc_bitmap_zero(obj->cpuset); + } + } + + for_each_child_safe(child, obj, temp) + collect_proc_cpuset(child, sys); +} + +/* While traversing down and up, propagate the offline/disallowed cpus by + * and'ing them to and from the first object that has a cpuset */ +static void +propagate_unused_cpuset(hwloc_obj_t obj, hwloc_obj_t sys) +{ + hwloc_obj_t child, *temp; + + if (obj->cpuset) { + if (sys) { + /* We are already given a pointer to an system object, update it and update ourselves */ + hwloc_bitmap_t mask = hwloc_bitmap_alloc(); + + /* Apply the topology cpuset */ + hwloc_bitmap_and(obj->cpuset, obj->cpuset, sys->cpuset); + + /* Update complete cpuset down */ + if (obj->complete_cpuset) { + hwloc_bitmap_and(obj->complete_cpuset, obj->complete_cpuset, sys->complete_cpuset); + } else { + obj->complete_cpuset = hwloc_bitmap_dup(sys->complete_cpuset); + hwloc_bitmap_and(obj->complete_cpuset, obj->complete_cpuset, obj->cpuset); + } + + /* Update online cpusets */ + if (obj->online_cpuset) { + /* Update ours */ + hwloc_bitmap_and(obj->online_cpuset, obj->online_cpuset, sys->online_cpuset); + + /* Update the given cpuset, but only what we know */ + hwloc_bitmap_copy(mask, obj->cpuset); + hwloc_bitmap_not(mask, mask); + hwloc_bitmap_or(mask, mask, obj->online_cpuset); + hwloc_bitmap_and(sys->online_cpuset, sys->online_cpuset, mask); + } else { + /* Just take it as such */ + obj->online_cpuset = hwloc_bitmap_dup(sys->online_cpuset); + hwloc_bitmap_and(obj->online_cpuset, obj->online_cpuset, obj->cpuset); + } + + /* Update allowed cpusets */ + if (obj->allowed_cpuset) { + /* Update ours */ + hwloc_bitmap_and(obj->allowed_cpuset, obj->allowed_cpuset, sys->allowed_cpuset); + + /* Update the given cpuset, but only what we know */ + hwloc_bitmap_copy(mask, obj->cpuset); + hwloc_bitmap_not(mask, mask); + hwloc_bitmap_or(mask, mask, obj->allowed_cpuset); + hwloc_bitmap_and(sys->allowed_cpuset, sys->allowed_cpuset, mask); + } else { + /* Just take it as such */ + obj->allowed_cpuset = hwloc_bitmap_dup(sys->allowed_cpuset); + hwloc_bitmap_and(obj->allowed_cpuset, obj->allowed_cpuset, obj->cpuset); + } + + hwloc_bitmap_free(mask); + } else { + /* This object is the root of a machine */ + sys = obj; + /* Apply complete cpuset to cpuset, online_cpuset and allowed_cpuset, it + * will automatically be applied below */ + if (obj->complete_cpuset) + hwloc_bitmap_and(obj->cpuset, obj->cpuset, obj->complete_cpuset); + else + obj->complete_cpuset = hwloc_bitmap_dup(obj->cpuset); + if (obj->online_cpuset) + hwloc_bitmap_and(obj->online_cpuset, obj->online_cpuset, obj->complete_cpuset); + else + obj->online_cpuset = hwloc_bitmap_dup(obj->complete_cpuset); + if (obj->allowed_cpuset) + hwloc_bitmap_and(obj->allowed_cpuset, obj->allowed_cpuset, obj->complete_cpuset); + else + obj->allowed_cpuset = hwloc_bitmap_dup(obj->complete_cpuset); + } + } + + for_each_child_safe(child, obj, temp) + propagate_unused_cpuset(child, sys); +} + +/* Force full nodeset for non-NUMA machines */ +static void +add_default_object_sets(hwloc_obj_t obj, int parent_has_sets) +{ + hwloc_obj_t child, *temp; + + /* I/O devices (and their children) have no sets */ + if (hwloc_obj_type_is_io(obj->type)) + return; + + if (parent_has_sets && obj->type != HWLOC_OBJ_MISC) { + /* non-MISC object must have cpuset if parent has one. */ + assert(obj->cpuset); + } + + /* other sets must be consistent with main cpuset: + * check cpusets and add nodesets if needed. + * + * MISC may have no sets at all (if added by parent), or usual ones (if added by cpuset), + * but that's not easy to detect, so just make sure sets are consistent as usual. + */ + if (obj->cpuset) { + assert(obj->online_cpuset); + assert(obj->complete_cpuset); + assert(obj->allowed_cpuset); + if (!obj->nodeset) + obj->nodeset = hwloc_bitmap_alloc_full(); + if (!obj->complete_nodeset) + obj->complete_nodeset = hwloc_bitmap_alloc_full(); + if (!obj->allowed_nodeset) + obj->allowed_nodeset = hwloc_bitmap_alloc_full(); + } else { + assert(!obj->online_cpuset); + assert(!obj->complete_cpuset); + assert(!obj->allowed_cpuset); + assert(!obj->nodeset); + assert(!obj->complete_nodeset); + assert(!obj->allowed_nodeset); + } + + for_each_child_safe(child, obj, temp) + add_default_object_sets(child, obj->cpuset != NULL); +} + +/* Setup object cpusets/nodesets by OR'ing its children. */ +HWLOC_DECLSPEC int +hwloc_fill_object_sets(hwloc_obj_t obj) +{ + hwloc_obj_t child; + assert(obj->cpuset != NULL); + child = obj->first_child; + while (child) { + assert(child->cpuset != NULL); + if (child->complete_cpuset) { + if (!obj->complete_cpuset) + obj->complete_cpuset = hwloc_bitmap_alloc(); + hwloc_bitmap_or(obj->complete_cpuset, obj->complete_cpuset, child->complete_cpuset); + } + if (child->online_cpuset) { + if (!obj->online_cpuset) + obj->online_cpuset = hwloc_bitmap_alloc(); + hwloc_bitmap_or(obj->online_cpuset, obj->online_cpuset, child->online_cpuset); + } + if (child->allowed_cpuset) { + if (!obj->allowed_cpuset) + obj->allowed_cpuset = hwloc_bitmap_alloc(); + hwloc_bitmap_or(obj->allowed_cpuset, obj->allowed_cpuset, child->allowed_cpuset); + } + if (child->nodeset) { + if (!obj->nodeset) + obj->nodeset = hwloc_bitmap_alloc(); + hwloc_bitmap_or(obj->nodeset, obj->nodeset, child->nodeset); + } + if (child->complete_nodeset) { + if (!obj->complete_nodeset) + obj->complete_nodeset = hwloc_bitmap_alloc(); + hwloc_bitmap_or(obj->complete_nodeset, obj->complete_nodeset, child->complete_nodeset); + } + if (child->allowed_nodeset) { + if (!obj->allowed_nodeset) + obj->allowed_nodeset = hwloc_bitmap_alloc(); + hwloc_bitmap_or(obj->allowed_nodeset, obj->allowed_nodeset, child->allowed_nodeset); + } + child = child->next_sibling; + } + return 0; +} + +/* Propagate nodesets up and down */ +static void +propagate_nodeset(hwloc_obj_t obj, hwloc_obj_t sys) +{ + hwloc_obj_t child, *temp; + hwloc_bitmap_t parent_nodeset = NULL; + int parent_weight = 0; + + if (!sys && obj->nodeset) { + sys = obj; + if (!obj->complete_nodeset) + obj->complete_nodeset = hwloc_bitmap_dup(obj->nodeset); + if (!obj->allowed_nodeset) + obj->allowed_nodeset = hwloc_bitmap_dup(obj->complete_nodeset); + } + + if (sys) { + if (obj->nodeset) { + /* Some existing nodeset coming from above, to possibly propagate down */ + parent_nodeset = obj->nodeset; + parent_weight = hwloc_bitmap_weight(parent_nodeset); + } else + obj->nodeset = hwloc_bitmap_alloc(); + } + + for_each_child_safe(child, obj, temp) { + /* don't propagate nodesets in I/O objects, keep them NULL */ + if (hwloc_obj_type_is_io(child->type)) + return; + /* don't propagate nodesets in Misc inserted by parent (no nodeset if no cpuset) */ + if (child->type == HWLOC_OBJ_MISC && !child->cpuset) + return; + + /* Propagate singleton nodesets down */ + if (parent_weight == 1) { + if (!child->nodeset) + child->nodeset = hwloc_bitmap_dup(obj->nodeset); + else if (!hwloc_bitmap_isequal(child->nodeset, parent_nodeset)) { + hwloc_debug_bitmap("Oops, parent nodeset %s", parent_nodeset); + hwloc_debug_bitmap(" is different from child nodeset %s, ignoring the child one\n", child->nodeset); + hwloc_bitmap_copy(child->nodeset, parent_nodeset); + } + } + + /* Recurse */ + propagate_nodeset(child, sys); + + /* Propagate children nodesets up */ + if (sys && child->nodeset) + hwloc_bitmap_or(obj->nodeset, obj->nodeset, child->nodeset); + } +} + +/* Propagate allowed and complete nodesets */ +static void +propagate_nodesets(hwloc_obj_t obj) +{ + hwloc_bitmap_t mask = hwloc_bitmap_alloc(); + hwloc_obj_t child, *temp; + + for_each_child_safe(child, obj, temp) { + /* don't propagate nodesets in I/O objects, keep them NULL */ + if (hwloc_obj_type_is_io(child->type)) + continue; + + if (obj->nodeset) { + /* Update complete nodesets down */ + if (child->complete_nodeset) { + hwloc_bitmap_and(child->complete_nodeset, child->complete_nodeset, obj->complete_nodeset); + } else if (child->nodeset) { + child->complete_nodeset = hwloc_bitmap_dup(obj->complete_nodeset); + hwloc_bitmap_and(child->complete_nodeset, child->complete_nodeset, child->nodeset); + } /* else the child doesn't have nodeset information, we can not provide a complete nodeset */ + + /* Update allowed nodesets down */ + if (child->allowed_nodeset) { + hwloc_bitmap_and(child->allowed_nodeset, child->allowed_nodeset, obj->allowed_nodeset); + } else if (child->nodeset) { + child->allowed_nodeset = hwloc_bitmap_dup(obj->allowed_nodeset); + hwloc_bitmap_and(child->allowed_nodeset, child->allowed_nodeset, child->nodeset); + } + } + + propagate_nodesets(child); + + if (obj->nodeset) { + /* Update allowed nodesets up */ + if (child->nodeset && child->allowed_nodeset) { + hwloc_bitmap_copy(mask, child->nodeset); + hwloc_bitmap_andnot(mask, mask, child->allowed_nodeset); + hwloc_bitmap_andnot(obj->allowed_nodeset, obj->allowed_nodeset, mask); + } + } + } + hwloc_bitmap_free(mask); + + if (obj->nodeset) { + /* Apply complete nodeset to nodeset and allowed_nodeset */ + if (obj->complete_nodeset) + hwloc_bitmap_and(obj->nodeset, obj->nodeset, obj->complete_nodeset); + else + obj->complete_nodeset = hwloc_bitmap_dup(obj->nodeset); + if (obj->allowed_nodeset) + hwloc_bitmap_and(obj->allowed_nodeset, obj->allowed_nodeset, obj->complete_nodeset); + else + obj->allowed_nodeset = hwloc_bitmap_dup(obj->complete_nodeset); + } +} + +static void +apply_nodeset(hwloc_obj_t obj, hwloc_obj_t sys) +{ + unsigned i; + hwloc_obj_t child, *temp; + + if (sys) { + if (obj->type == HWLOC_OBJ_NODE && obj->os_index != (unsigned) -1 && + !hwloc_bitmap_isset(sys->allowed_nodeset, obj->os_index)) { + hwloc_debug("Dropping memory from disallowed node %u\n", obj->os_index); + obj->memory.local_memory = 0; + obj->memory.total_memory = 0; + for(i=0; imemory.page_types_len; i++) + obj->memory.page_types[i].count = 0; + } + } else { + if (obj->allowed_nodeset) { + sys = obj; + } + } + + for_each_child_safe(child, obj, temp) + apply_nodeset(child, sys); +} + +static void +remove_unused_cpusets(hwloc_obj_t obj) +{ + hwloc_obj_t child, *temp; + + if (obj->cpuset) { + hwloc_bitmap_and(obj->cpuset, obj->cpuset, obj->online_cpuset); + hwloc_bitmap_and(obj->cpuset, obj->cpuset, obj->allowed_cpuset); + } + + for_each_child_safe(child, obj, temp) + remove_unused_cpusets(child); +} + +/* Remove an object from its parent and free it. + * Only updates next_sibling/first_child pointers, + * so may only be used during early discovery. + * Children are inserted where the object was. + */ +static void +unlink_and_free_single_object(hwloc_obj_t *pparent) +{ + hwloc_obj_t parent = *pparent; + hwloc_obj_t child = parent->first_child; + /* Replace object with its list of children */ + if (child) { + *pparent = child; + while (child->next_sibling) + child = child->next_sibling; + child->next_sibling = parent->next_sibling; + } else + *pparent = parent->next_sibling; + hwloc_free_unlinked_object(parent); +} + +/* Remove all ignored objects. */ +static int +remove_ignored(hwloc_topology_t topology, hwloc_obj_t *pparent) +{ + hwloc_obj_t parent = *pparent, child, *pchild; + int dropped_children = 0; + int dropped = 0; + + for_each_child_safe(child, parent, pchild) + dropped_children += remove_ignored(topology, pchild); + + if ((parent != topology->levels[0][0] && + topology->ignored_types[parent->type] == HWLOC_IGNORE_TYPE_ALWAYS) + || (parent->type == HWLOC_OBJ_CACHE && parent->attr->cache.type == HWLOC_OBJ_CACHE_INSTRUCTION + && !(topology->flags & HWLOC_TOPOLOGY_FLAG_ICACHES))) { + hwloc_debug("%s", "\nDropping ignored object "); + print_object(topology, 0, parent); + unlink_and_free_single_object(pparent); + dropped = 1; + + } else if (dropped_children) { + /* we keep this object but its children changed, reorder them by cpuset */ + + /* move the children list on the side */ + hwloc_obj_t *prev, children = parent->first_child; + parent->first_child = NULL; + while (children) { + /* dequeue child */ + child = children; + children = child->next_sibling; + /* find where to enqueue it */ + prev = &parent->first_child; + while (*prev + && (!child->cpuset || !(*prev)->cpuset + || hwloc_bitmap_compare_first(child->cpuset, (*prev)->cpuset) > 0)) + prev = &((*prev)->next_sibling); + /* enqueue */ + child->next_sibling = *prev; + *prev = child; + } + } + + return dropped; +} + +/* Remove an object and its children from its parent and free them. + * Only updates next_sibling/first_child pointers, + * so may only be used during early discovery. + */ +static void +unlink_and_free_object_and_children(hwloc_obj_t *pobj) +{ + hwloc_obj_t obj = *pobj, child, *pchild; + + for_each_child_safe(child, obj, pchild) + unlink_and_free_object_and_children(pchild); + + *pobj = obj->next_sibling; + hwloc_free_unlinked_object(obj); +} + +/* Remove all children whose cpuset is empty, except NUMA nodes + * since we want to keep memory information, and except PCI bridges and devices. + */ +static void +remove_empty(hwloc_topology_t topology, hwloc_obj_t *pobj) +{ + hwloc_obj_t obj = *pobj, child, *pchild; + + for_each_child_safe(child, obj, pchild) + remove_empty(topology, pchild); + + if (obj->type != HWLOC_OBJ_NODE + && !obj->first_child /* only remove if all children were removed above, so that we don't remove parents of NUMAnode */ + && !hwloc_obj_type_is_io(obj->type) && obj->type != HWLOC_OBJ_MISC + && obj->cpuset /* don't remove if no cpuset at all, there's likely a good reason why it's different from having an empty cpuset */ + && hwloc_bitmap_iszero(obj->cpuset)) { + /* Remove empty children */ + hwloc_debug("%s", "\nRemoving empty object "); + print_object(topology, 0, obj); + unlink_and_free_single_object(pobj); + } +} + +/* adjust object cpusets according the given droppedcpuset, + * drop object whose cpuset becomes empty, + * and mark dropped nodes in droppednodeset + */ +static void +restrict_object(hwloc_topology_t topology, unsigned long flags, hwloc_obj_t *pobj, hwloc_const_cpuset_t droppedcpuset, hwloc_nodeset_t droppednodeset, int droppingparent) +{ + hwloc_obj_t obj = *pobj, child, *pchild; + int dropping; + int modified = obj->complete_cpuset && hwloc_bitmap_intersects(obj->complete_cpuset, droppedcpuset); + + hwloc_clear_object_distances(obj); + + if (obj->cpuset) + hwloc_bitmap_andnot(obj->cpuset, obj->cpuset, droppedcpuset); + if (obj->complete_cpuset) + hwloc_bitmap_andnot(obj->complete_cpuset, obj->complete_cpuset, droppedcpuset); + if (obj->online_cpuset) + hwloc_bitmap_andnot(obj->online_cpuset, obj->online_cpuset, droppedcpuset); + if (obj->allowed_cpuset) + hwloc_bitmap_andnot(obj->allowed_cpuset, obj->allowed_cpuset, droppedcpuset); + + if (obj->type == HWLOC_OBJ_MISC) { + dropping = droppingparent && !(flags & HWLOC_RESTRICT_FLAG_ADAPT_MISC); + } else if (hwloc_obj_type_is_io(obj->type)) { + dropping = droppingparent && !(flags & HWLOC_RESTRICT_FLAG_ADAPT_IO); + } else { + dropping = droppingparent || (obj->cpuset && hwloc_bitmap_iszero(obj->cpuset)); + } + + if (modified) + for_each_child_safe(child, obj, pchild) + restrict_object(topology, flags, pchild, droppedcpuset, droppednodeset, dropping); + + if (dropping) { + hwloc_debug("%s", "\nRemoving object during restrict"); + print_object(topology, 0, obj); + if (obj->type == HWLOC_OBJ_NODE) + hwloc_bitmap_set(droppednodeset, obj->os_index); + /* remove the object from the tree (no need to remove from levels, they will be entirely rebuilt by the caller) */ + unlink_and_free_single_object(pobj); + /* do not remove children. if they were to be removed, they would have been already */ + } +} + +/* adjust object nodesets accordingly the given droppednodeset + */ +static void +restrict_object_nodeset(hwloc_topology_t topology, hwloc_obj_t *pobj, hwloc_nodeset_t droppednodeset) +{ + hwloc_obj_t obj = *pobj, child, *pchild; + + /* if this object isn't modified, don't bother looking at children */ + if (obj->complete_nodeset && !hwloc_bitmap_intersects(obj->complete_nodeset, droppednodeset)) + return; + + if (obj->nodeset) + hwloc_bitmap_andnot(obj->nodeset, obj->nodeset, droppednodeset); + if (obj->complete_nodeset) + hwloc_bitmap_andnot(obj->complete_nodeset, obj->complete_nodeset, droppednodeset); + if (obj->allowed_nodeset) + hwloc_bitmap_andnot(obj->allowed_nodeset, obj->allowed_nodeset, droppednodeset); + + for_each_child_safe(child, obj, pchild) + restrict_object_nodeset(topology, pchild, droppednodeset); +} + +/* we don't want to merge groups that were inserted explicitly with the custom interface */ +static int +can_merge_group(hwloc_topology_t topology, hwloc_obj_t obj) +{ + const char *value; + /* custom-inserted groups are in custom topologies and have no cpusets, + * don't bother calling hwloc_obj_get_info_by_name() and strcmp() uselessly. + */ + if (!topology->backends->is_custom || obj->cpuset) + return 1; + value = hwloc_obj_get_info_by_name(obj, "Backend"); + return (!value) || strcmp(value, "Custom"); +} + +/* + * Merge with the only child if either the parent or the child has a type to be + * ignored while keeping structure + */ +static void +merge_useless_child(hwloc_topology_t topology, hwloc_obj_t *pparent) +{ + hwloc_obj_t parent = *pparent, child, *pchild, ios; + int replacechild = 0, replaceparent = 0; + + for_each_child_safe(child, parent, pchild) + merge_useless_child(topology, pchild); + + child = parent->first_child; + if (!child) + /* There are no child, nothing to merge. */ + return; + + if (child->next_sibling && !hwloc_obj_type_is_io(child->next_sibling->type)) + /* There are several non-I/O children */ + return; + + /* There is one non-I/O child and possible some I/O children. + * I/O children shouldn't prevent merging because they can be attached + * to anything with the same locality. + * Move them to the side during merging, and append them back later. + * This is easy because I/O children are always last in the list. + */ + ios = child->next_sibling; + child->next_sibling = NULL; + + /* Check whether parent and/or child can be replaced */ + if (topology->ignored_types[parent->type] == HWLOC_IGNORE_TYPE_KEEP_STRUCTURE) { + if (parent->type != HWLOC_OBJ_GROUP || can_merge_group(topology, parent)) + /* Parent can be ignored in favor of the child. */ + replaceparent = 1; + } + if (topology->ignored_types[child->type] == HWLOC_IGNORE_TYPE_KEEP_STRUCTURE) { + if (child->type != HWLOC_OBJ_GROUP || can_merge_group(topology, child)) + /* Child can be ignored in favor of the parent. */ + replacechild = 1; + } + + /* Decide which one to actually replace */ + if (replaceparent && replacechild) { + /* If both may be replaced, look at obj_type_priority */ + if (obj_type_priority[parent->type] > obj_type_priority[child->type]) + replaceparent = 0; + else + replacechild = 0; + } + + if (replaceparent) { + /* Replace parent with child */ + hwloc_debug("%s", "\nIgnoring parent "); + print_object(topology, 0, parent); + if (parent == topology->levels[0][0]) { + child->parent = NULL; + child->depth = 0; + } + *pparent = child; + child->next_sibling = parent->next_sibling; + hwloc_free_unlinked_object(parent); + + } else if (replacechild) { + /* Replace child with parent */ + hwloc_debug("%s", "\nIgnoring child "); + print_object(topology, 0, child); + parent->first_child = child->first_child; + hwloc_free_unlinked_object(child); + } + + if (ios) { + /* append I/O children to the list of children of the remaining object */ + pchild = &((*pparent)->first_child); + while (*pchild) + pchild = &((*pchild)->next_sibling); + *pchild = ios; + } +} + +static void +hwloc_drop_all_io(hwloc_topology_t topology, hwloc_obj_t root) +{ + hwloc_obj_t child, *pchild; + for_each_child_safe(child, root, pchild) { + if (hwloc_obj_type_is_io(child->type)) + unlink_and_free_object_and_children(pchild); + else + hwloc_drop_all_io(topology, child); + } +} + +/* + * If IO_DEVICES and WHOLE_IO are not set, we drop everything. + * If WHOLE_IO is not set, we drop non-interesting devices, + * and bridges that have no children. + * If IO_BRIDGES is also not set, we also drop all bridges + * except the hostbridges. + */ +static void +hwloc_drop_useless_io(hwloc_topology_t topology, hwloc_obj_t root) +{ + hwloc_obj_t child, *pchild; + + if (!(topology->flags & (HWLOC_TOPOLOGY_FLAG_IO_DEVICES|HWLOC_TOPOLOGY_FLAG_WHOLE_IO))) { + /* drop all I/O children */ + hwloc_drop_all_io(topology, root); + return; + } + + if (!(topology->flags & HWLOC_TOPOLOGY_FLAG_WHOLE_IO)) { + /* drop non-interesting devices */ + for_each_child_safe(child, root, pchild) { + if (child->type == HWLOC_OBJ_PCI_DEVICE) { + unsigned classid = child->attr->pcidev.class_id; + unsigned baseclass = classid >> 8; + if (baseclass != 0x03 /* PCI_BASE_CLASS_DISPLAY */ + && baseclass != 0x02 /* PCI_BASE_CLASS_NETWORK */ + && baseclass != 0x01 /* PCI_BASE_CLASS_STORAGE */ + && baseclass != 0x0b /* PCI_BASE_CLASS_PROCESSOR */ + && classid != 0x0c06 /* PCI_CLASS_SERIAL_INFINIBAND */) + unlink_and_free_object_and_children(pchild); + } + } + } + + /* look at remaining children, process recursively, and remove useless bridges */ + for_each_child_safe(child, root, pchild) { + hwloc_drop_useless_io(topology, child); + + if (child->type == HWLOC_OBJ_BRIDGE) { + hwloc_obj_t grandchildren = child->first_child; + + if (!grandchildren) { + /* bridges with no children are removed if WHOLE_IO isn't given */ + if (!(topology->flags & (HWLOC_TOPOLOGY_FLAG_WHOLE_IO))) { + *pchild = child->next_sibling; + hwloc_free_unlinked_object(child); + } + + } else if (child->attr->bridge.upstream_type != HWLOC_OBJ_BRIDGE_HOST) { + /* only hostbridges are kept if WHOLE_IO or IO_BRIDGE are not given */ + if (!(topology->flags & (HWLOC_TOPOLOGY_FLAG_IO_BRIDGES|HWLOC_TOPOLOGY_FLAG_WHOLE_IO))) { + /* insert grandchildren in place of child */ + *pchild = grandchildren; + for( ; grandchildren->next_sibling != NULL ; grandchildren = grandchildren->next_sibling); + grandchildren->next_sibling = child->next_sibling; + hwloc_free_unlinked_object(child); + } + } + } + } +} + +static void +hwloc_propagate_bridge_depth(hwloc_topology_t topology, hwloc_obj_t root, unsigned depth) +{ + hwloc_obj_t child = root->first_child; + while (child) { + if (child->type == HWLOC_OBJ_BRIDGE) { + child->attr->bridge.depth = depth; + hwloc_propagate_bridge_depth(topology, child, depth+1); + } + child = child->next_sibling; + } +} + +static void +hwloc_propagate_symmetric_subtree(hwloc_topology_t topology, hwloc_obj_t root) +{ + hwloc_obj_t child, *array; + + /* assume we're not symmetric by default */ + root->symmetric_subtree = 0; + + /* if no child, we are symmetric */ + if (!root->arity) { + root->symmetric_subtree = 1; + return; + } + + /* look at children, and return if they are not symmetric */ + child = NULL; + while ((child = hwloc_get_next_child(topology, root, child)) != NULL) + hwloc_propagate_symmetric_subtree(topology, child); + while ((child = hwloc_get_next_child(topology, root, child)) != NULL) + if (!child->symmetric_subtree) + return; + + /* now check that children subtrees are identical. + * just walk down the first child in each tree and compare their depth and arities + */ + array = malloc(root->arity * sizeof(*array)); + memcpy(array, root->children, root->arity * sizeof(*array)); + while (1) { + unsigned i; + /* check current level arities and depth */ + for(i=1; iarity; i++) + if (array[i]->depth != array[0]->depth + || array[i]->arity != array[0]->arity) { + free(array); + return; + } + if (!array[0]->arity) + /* no more children level, we're ok */ + break; + /* look at first child of each element now */ + for(i=0; iarity; i++) + array[i] = array[i]->first_child; + } + free(array); + + /* everything went fine, we're symmetric */ + root->symmetric_subtree = 1; +} + +/* + * Initialize handy pointers in the whole topology. + * The topology only had first_child and next_sibling pointers. + * When this funtions return, all parent/children pointers are initialized. + * The remaining fields (levels, cousins, logical_index, depth, ...) will + * be setup later in hwloc_connect_levels(). + */ +void +hwloc_connect_children(hwloc_obj_t parent) +{ + unsigned n; + hwloc_obj_t child, prev_child = NULL; + + for (n = 0, child = parent->first_child; + child; + n++, prev_child = child, child = child->next_sibling) { + child->parent = parent; + child->sibling_rank = n; + child->prev_sibling = prev_child; + } + parent->last_child = prev_child; + + parent->arity = n; + free(parent->children); + if (!n) { + parent->children = NULL; + return; + } + + parent->children = malloc(n * sizeof(*parent->children)); + for (n = 0, child = parent->first_child; + child; + n++, child = child->next_sibling) { + parent->children[n] = child; + hwloc_connect_children(child); + } +} + +/* + * Check whether there is an object below ROOT that has the same type as OBJ + */ +static int +find_same_type(hwloc_obj_t root, hwloc_obj_t obj) +{ + hwloc_obj_t child; + + if (hwloc_type_cmp(root, obj) == HWLOC_TYPE_EQUAL) + return 1; + + for (child = root->first_child; child; child = child->next_sibling) + if (find_same_type(child, obj)) + return 1; + + return 0; +} + +/* traverse the array of current object and compare them with top_obj. + * if equal, take the object and put its children into the remaining objs. + * if not equal, put the object into the remaining objs. + */ +static int +hwloc_level_take_objects(hwloc_obj_t top_obj, + hwloc_obj_t *current_objs, unsigned n_current_objs, + hwloc_obj_t *taken_objs, unsigned n_taken_objs __hwloc_attribute_unused, + hwloc_obj_t *remaining_objs, unsigned n_remaining_objs __hwloc_attribute_unused) +{ + unsigned taken_i = 0; + unsigned new_i = 0; + unsigned i, j; + + for (i = 0; i < n_current_objs; i++) + if (hwloc_type_cmp(top_obj, current_objs[i]) == HWLOC_TYPE_EQUAL) { + /* Take it, add children. */ + taken_objs[taken_i++] = current_objs[i]; + for (j = 0; j < current_objs[i]->arity; j++) + remaining_objs[new_i++] = current_objs[i]->children[j]; + } else { + /* Leave it. */ + remaining_objs[new_i++] = current_objs[i]; + } + +#ifdef HWLOC_DEBUG + /* Make sure we didn't mess up. */ + assert(taken_i == n_taken_objs); + assert(new_i == n_current_objs - n_taken_objs + n_remaining_objs); +#endif + + return new_i; +} + +/* Given an input object, copy it or its interesting children into the output array. + * If new_obj is NULL, we're just counting interesting ohjects. + */ +static unsigned +hwloc_level_filter_object(hwloc_topology_t topology, + hwloc_obj_t *new_obj, hwloc_obj_t old) +{ + unsigned i, total; + if (hwloc_obj_type_is_io(old->type)) { + if (new_obj) + append_iodevs(topology, old); + return 0; + } + if (old->type != HWLOC_OBJ_MISC) { + if (new_obj) + *new_obj = old; + return 1; + } + for(i=0, total=0; iarity; i++) { + int nb = hwloc_level_filter_object(topology, new_obj, old->children[i]); + if (new_obj) { + new_obj += nb; + } + total += nb; + } + return total; +} + +/* Replace an input array of objects with an input array containing + * only interesting objects for levels. + * Misc objects are removed, their interesting children are added. + * I/O devices are removed and queue to their own lists. + */ +static int +hwloc_level_filter_objects(hwloc_topology_t topology, + hwloc_obj_t **objs, unsigned *n_objs) +{ + hwloc_obj_t *old = *objs, *new; + unsigned nold = *n_objs, nnew, i; + + /* anything to filter? */ + for(i=0; itype) + || old[i]->type == HWLOC_OBJ_MISC) + break; + if (i==nold) + return 0; + + /* count interesting objects and allocate the new array */ + for(i=0, nnew=0; inext_cousin; + } + nb = i; + + /* allocate and fill level */ + *levelp = malloc(nb * sizeof(struct hwloc_obj *)); + obj = first; + i = 0; + while (obj) { + obj->logical_index = i; + (*levelp)[i] = obj; + i++; + obj = obj->next_cousin; + } + + return nb; +} + +/* + * Do the remaining work that hwloc_connect_children() did not do earlier. + */ +int +hwloc_connect_levels(hwloc_topology_t topology) +{ + unsigned l, i=0; + hwloc_obj_t *objs, *taken_objs, *new_objs, top_obj; + unsigned n_objs, n_taken_objs, n_new_objs; + int err; + + /* reset non-root levels (root was initialized during init and will not change here) */ + for(l=1; llevels[l]); + memset(topology->levels+1, 0, (HWLOC_DEPTH_MAX-1)*sizeof(*topology->levels)); + memset(topology->level_nbobjects+1, 0, (HWLOC_DEPTH_MAX-1)*sizeof(*topology->level_nbobjects)); + topology->nb_levels = 1; + /* don't touch next_group_depth, the Group objects are still here */ + + /* initialize all depth to unknown */ + for (l = HWLOC_OBJ_SYSTEM; l < HWLOC_OBJ_TYPE_MAX; l++) + topology->type_depth[l] = HWLOC_TYPE_DEPTH_UNKNOWN; + /* initialize root type depth */ + topology->type_depth[topology->levels[0][0]->type] = 0; + + /* initialize I/O special levels */ + free(topology->bridge_level); + topology->bridge_level = NULL; + topology->bridge_nbobjects = 0; + topology->first_bridge = topology->last_bridge = NULL; + topology->type_depth[HWLOC_OBJ_BRIDGE] = HWLOC_TYPE_DEPTH_BRIDGE; + free(topology->pcidev_level); + topology->pcidev_level = NULL; + topology->pcidev_nbobjects = 0; + topology->first_pcidev = topology->last_pcidev = NULL; + topology->type_depth[HWLOC_OBJ_PCI_DEVICE] = HWLOC_TYPE_DEPTH_PCI_DEVICE; + free(topology->osdev_level); + topology->osdev_level = NULL; + topology->osdev_nbobjects = 0; + topology->first_osdev = topology->last_osdev = NULL; + topology->type_depth[HWLOC_OBJ_OS_DEVICE] = HWLOC_TYPE_DEPTH_OS_DEVICE; + + /* Start with children of the whole system. */ + n_objs = topology->levels[0][0]->arity; + objs = malloc(n_objs * sizeof(objs[0])); + if (!objs) { + errno = ENOMEM; + return -1; + } + memcpy(objs, topology->levels[0][0]->children, n_objs*sizeof(objs[0])); + + /* Filter-out interesting objects */ + err = hwloc_level_filter_objects(topology, &objs, &n_objs); + if (err < 0) + return -1; + + /* Keep building levels while there are objects left in OBJS. */ + while (n_objs) { + /* At this point, the objs array contains only objects that may go into levels */ + + /* First find which type of object is the topmost. + * Don't use PU if there are other types since we want to keep PU at the bottom. + */ + + /* Look for the first non-PU object, and use the first PU if we really find nothing else */ + for (i = 0; i < n_objs; i++) + if (objs[i]->type != HWLOC_OBJ_PU) + break; + top_obj = i == n_objs ? objs[0] : objs[i]; + + /* See if this is actually the topmost object */ + for (i = 0; i < n_objs; i++) { + if (hwloc_type_cmp(top_obj, objs[i]) != HWLOC_TYPE_EQUAL) { + if (find_same_type(objs[i], top_obj)) { + /* OBJS[i] is strictly above an object of the same type as TOP_OBJ, so it + * is above TOP_OBJ. */ + top_obj = objs[i]; + } + } + } + + /* Now peek all objects of the same type, build a level with that and + * replace them with their children. */ + + /* First count them. */ + n_taken_objs = 0; + n_new_objs = 0; + for (i = 0; i < n_objs; i++) + if (hwloc_type_cmp(top_obj, objs[i]) == HWLOC_TYPE_EQUAL) { + n_taken_objs++; + n_new_objs += objs[i]->arity; + } + + /* New level. */ + taken_objs = malloc((n_taken_objs + 1) * sizeof(taken_objs[0])); + /* New list of pending objects. */ + if (n_objs - n_taken_objs + n_new_objs) { + new_objs = malloc((n_objs - n_taken_objs + n_new_objs) * sizeof(new_objs[0])); + } else { +#ifdef HWLOC_DEBUG + assert(!n_new_objs); + assert(n_objs == n_taken_objs); +#endif + new_objs = NULL; + } + + n_new_objs = hwloc_level_take_objects(top_obj, + objs, n_objs, + taken_objs, n_taken_objs, + new_objs, n_new_objs); + + /* Ok, put numbers in the level and link cousins. */ + for (i = 0; i < n_taken_objs; i++) { + taken_objs[i]->depth = topology->nb_levels; + taken_objs[i]->logical_index = i; + if (i) { + taken_objs[i]->prev_cousin = taken_objs[i-1]; + taken_objs[i-1]->next_cousin = taken_objs[i]; + } + } + taken_objs[0]->prev_cousin = NULL; + taken_objs[n_taken_objs-1]->next_cousin = NULL; + + /* One more level! */ + if (top_obj->type == HWLOC_OBJ_CACHE) + hwloc_debug("--- Cache level depth %u", top_obj->attr->cache.depth); + else + hwloc_debug("--- %s level", hwloc_obj_type_string(top_obj->type)); + hwloc_debug(" has number %u\n\n", topology->nb_levels); + + if (topology->type_depth[top_obj->type] == HWLOC_TYPE_DEPTH_UNKNOWN) + topology->type_depth[top_obj->type] = topology->nb_levels; + else + topology->type_depth[top_obj->type] = HWLOC_TYPE_DEPTH_MULTIPLE; /* mark as unknown */ + + taken_objs[n_taken_objs] = NULL; + + topology->level_nbobjects[topology->nb_levels] = n_taken_objs; + topology->levels[topology->nb_levels] = taken_objs; + + topology->nb_levels++; + + free(objs); + + /* Switch to new_objs, after filtering-out interesting objects */ + err = hwloc_level_filter_objects(topology, &new_objs, &n_new_objs); + if (err < 0) + return -1; + + objs = new_objs; + n_objs = n_new_objs; + } + + /* It's empty now. */ + if (objs) + free(objs); + + topology->bridge_nbobjects = hwloc_build_level_from_list(topology->first_bridge, &topology->bridge_level); + topology->pcidev_nbobjects = hwloc_build_level_from_list(topology->first_pcidev, &topology->pcidev_level); + topology->osdev_nbobjects = hwloc_build_level_from_list(topology->first_osdev, &topology->osdev_level); + + hwloc_propagate_symmetric_subtree(topology, topology->levels[0][0]); + + return 0; +} + +void hwloc_alloc_obj_cpusets(hwloc_obj_t obj) +{ + obj->cpuset = hwloc_bitmap_alloc_full(); + obj->complete_cpuset = hwloc_bitmap_alloc(); + obj->online_cpuset = hwloc_bitmap_alloc_full(); + obj->allowed_cpuset = hwloc_bitmap_alloc_full(); + obj->nodeset = hwloc_bitmap_alloc(); + obj->complete_nodeset = hwloc_bitmap_alloc(); + obj->allowed_nodeset = hwloc_bitmap_alloc_full(); +} + +/* Main discovery loop */ +static int +hwloc_discover(struct hwloc_topology *topology) +{ + struct hwloc_backend *backend; + int gotsomeio = 0; + unsigned discoveries = 0; + unsigned need_reconnect = 0; + + /* discover() callbacks should use hwloc_insert to add objects initialized + * through hwloc_alloc_setup_object. + * For node levels, nodeset and memory must be initialized. + * For cache levels, memory and type/depth must be initialized. + * For group levels, depth must be initialized. + */ + + /* There must be at least a PU object for each logical processor, at worse + * produced by hwloc_setup_pu_level() + */ + + /* To be able to just use hwloc_insert_object_by_cpuset to insert the object + * in the topology according to the cpuset, the cpuset field must be + * initialized. + */ + + /* A priori, All processors are visible in the topology, online, and allowed + * for the application. + * + * - If some processors exist but topology information is unknown for them + * (and thus the backend couldn't create objects for them), they should be + * added to the complete_cpuset field of the lowest object where the object + * could reside. + * + * - If some processors are not online, they should be dropped from the + * online_cpuset field. + * + * - If some processors are not allowed for the application (e.g. for + * administration reasons), they should be dropped from the allowed_cpuset + * field. + * + * The same applies to the node sets complete_nodeset and allowed_cpuset. + * + * If such field doesn't exist yet, it can be allocated, and initialized to + * zero (for complete), or to full (for online and allowed). The values are + * automatically propagated to the whole tree after detection. + */ + + /* + * Discover CPUs first + */ + backend = topology->backends; + while (NULL != backend) { + int err; + if (backend->component->type != HWLOC_DISC_COMPONENT_TYPE_CPU + && backend->component->type != HWLOC_DISC_COMPONENT_TYPE_GLOBAL) + /* not yet */ + goto next_cpubackend; + if (!backend->discover) + goto next_cpubackend; + + if (need_reconnect && (backend->flags & HWLOC_BACKEND_FLAG_NEED_LEVELS)) { + hwloc_debug("Backend %s forcing a reconnect of levels\n", backend->component->name); + hwloc_connect_children(topology->levels[0][0]); + if (hwloc_connect_levels(topology) < 0) + return -1; + need_reconnect = 0; + } + + err = backend->discover(backend); + if (err >= 0) { + if (backend->component->type == HWLOC_DISC_COMPONENT_TYPE_GLOBAL) + gotsomeio += err; + discoveries++; + if (err > 0) + need_reconnect++; + } + print_objects(topology, 0, topology->levels[0][0]); + +next_cpubackend: + backend = backend->next; + } + + if (!discoveries) { + hwloc_debug("%s", "No CPU backend enabled or no discovery succeeded\n"); + errno = EINVAL; + return -1; + } + + /* + * Group levels by distances + */ + hwloc_distances_finalize_os(topology); + hwloc_group_by_distances(topology); + + /* Update objects cpusets and nodesets now that the CPU/GLOBAL backend populated PUs and nodes */ + + hwloc_debug("%s", "\nRestrict topology cpusets to existing PU and NODE objects\n"); + collect_proc_cpuset(topology->levels[0][0], NULL); + + hwloc_debug("%s", "\nPropagate offline and disallowed cpus down and up\n"); + propagate_unused_cpuset(topology->levels[0][0], NULL); + + if (topology->levels[0][0]->complete_nodeset && hwloc_bitmap_iszero(topology->levels[0][0]->complete_nodeset)) { + /* No nodeset, drop all of them */ + hwloc_bitmap_free(topology->levels[0][0]->nodeset); + topology->levels[0][0]->nodeset = NULL; + hwloc_bitmap_free(topology->levels[0][0]->complete_nodeset); + topology->levels[0][0]->complete_nodeset = NULL; + hwloc_bitmap_free(topology->levels[0][0]->allowed_nodeset); + topology->levels[0][0]->allowed_nodeset = NULL; + } + hwloc_debug("%s", "\nPropagate nodesets\n"); + propagate_nodeset(topology->levels[0][0], NULL); + propagate_nodesets(topology->levels[0][0]); + + print_objects(topology, 0, topology->levels[0][0]); + + if (!(topology->flags & HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM)) { + hwloc_debug("%s", "\nRemoving unauthorized and offline cpusets from all cpusets\n"); + remove_unused_cpusets(topology->levels[0][0]); + + hwloc_debug("%s", "\nRemoving disallowed memory according to nodesets\n"); + apply_nodeset(topology->levels[0][0], NULL); + + print_objects(topology, 0, topology->levels[0][0]); + } + + hwloc_debug("%s", "\nAdd default object sets\n"); + add_default_object_sets(topology->levels[0][0], 0); + + /* Now connect handy pointers to make remaining discovery easier. */ + hwloc_debug("%s", "\nOk, finished tweaking, now connect\n"); + hwloc_connect_children(topology->levels[0][0]); + if (hwloc_connect_levels(topology) < 0) + return -1; + print_objects(topology, 0, topology->levels[0][0]); + + /* + * Additional discovery with other backends + */ + + backend = topology->backends; + need_reconnect = 0; + while (NULL != backend) { + int err; + if (backend->component->type == HWLOC_DISC_COMPONENT_TYPE_CPU + || backend->component->type == HWLOC_DISC_COMPONENT_TYPE_GLOBAL) + /* already done above */ + goto next_noncpubackend; + if (!backend->discover) + goto next_noncpubackend; + + if (need_reconnect && (backend->flags & HWLOC_BACKEND_FLAG_NEED_LEVELS)) { + hwloc_debug("Backend %s forcing a reconnect of levels\n", backend->component->name); + hwloc_connect_children(topology->levels[0][0]); + if (hwloc_connect_levels(topology) < 0) + return -1; + need_reconnect = 0; + } + + err = backend->discover(backend); + if (err >= 0) { + gotsomeio += err; + if (err > 0) + need_reconnect++; + } + print_objects(topology, 0, topology->levels[0][0]); + +next_noncpubackend: + backend = backend->next; + } + + /* if we got anything, filter interesting objects and update the tree */ + if (gotsomeio) { + hwloc_drop_useless_io(topology, topology->levels[0][0]); + hwloc_debug("%s", "\nNow reconnecting\n"); + print_objects(topology, 0, topology->levels[0][0]); + hwloc_propagate_bridge_depth(topology, topology->levels[0][0], 0); + } + + /* Removed some stuff */ + + hwloc_debug("%s", "\nRemoving ignored objects\n"); + remove_ignored(topology, &topology->levels[0][0]); + print_objects(topology, 0, topology->levels[0][0]); + + hwloc_debug("%s", "\nRemoving empty objects except numa nodes and PCI devices\n"); + remove_empty(topology, &topology->levels[0][0]); + if (!topology->levels[0][0]) { + fprintf(stderr, "Topology became empty, aborting!\n"); + abort(); + } + print_objects(topology, 0, topology->levels[0][0]); + + hwloc_debug("%s", "\nRemoving objects whose type has HWLOC_IGNORE_TYPE_KEEP_STRUCTURE and have only one child or are the only child\n"); + merge_useless_child(topology, &topology->levels[0][0]); + print_objects(topology, 0, topology->levels[0][0]); + + /* Reconnect things after all these changes */ + hwloc_connect_children(topology->levels[0][0]); + if (hwloc_connect_levels(topology) < 0) + return -1; + + /* accumulate children memory in total_memory fields (only once parent is set) */ + hwloc_debug("%s", "\nPropagate total memory up\n"); + propagate_total_memory(topology->levels[0][0]); + + /* + * Now that objects are numbered, take distance matrices from backends and put them in the main topology. + * + * Some objects may have disappeared (in removed_empty or removed_ignored) since we setup os distances + * (hwloc_distances_finalize_os()) above. Reset them so as to not point to disappeared objects anymore. + */ + hwloc_distances_restrict_os(topology); + hwloc_distances_finalize_os(topology); + hwloc_distances_finalize_logical(topology); + + /* + * Now set binding hooks according to topology->is_thissystem + * what the native OS backend offers. + */ + hwloc_set_binding_hooks(topology); + + return 0; +} + +/* To be before discovery is actually launched, + * Resets everything in case a previous load initialized some stuff. + */ +void +hwloc_topology_setup_defaults(struct hwloc_topology *topology) +{ + struct hwloc_obj *root_obj; + + /* reset support */ + memset(&topology->binding_hooks, 0, sizeof(topology->binding_hooks)); + memset(topology->support.discovery, 0, sizeof(*topology->support.discovery)); + memset(topology->support.cpubind, 0, sizeof(*topology->support.cpubind)); + memset(topology->support.membind, 0, sizeof(*topology->support.membind)); + + /* Only the System object on top by default */ + topology->nb_levels = 1; /* there's at least SYSTEM */ + topology->next_group_depth = 0; + topology->levels[0] = malloc (sizeof (hwloc_obj_t)); + topology->level_nbobjects[0] = 1; + /* NULLify other levels so that we can detect and free old ones in hwloc_connect_levels() if needed */ + memset(topology->levels+1, 0, (HWLOC_DEPTH_MAX-1)*sizeof(*topology->levels)); + topology->bridge_level = NULL; + topology->pcidev_level = NULL; + topology->osdev_level = NULL; + topology->first_bridge = topology->last_bridge = NULL; + topology->first_pcidev = topology->last_pcidev = NULL; + topology->first_osdev = topology->last_osdev = NULL; + + /* Create the actual machine object, but don't touch its attributes yet + * since the OS backend may still change the object into something else + * (for instance System) + */ + root_obj = hwloc_alloc_setup_object(HWLOC_OBJ_MACHINE, 0); + root_obj->depth = 0; + root_obj->logical_index = 0; + root_obj->sibling_rank = 0; + topology->levels[0][0] = root_obj; +} + +int +hwloc_topology_init (struct hwloc_topology **topologyp) +{ + struct hwloc_topology *topology; + int i; + + topology = malloc (sizeof (struct hwloc_topology)); + if(!topology) + return -1; + + hwloc_components_init(topology); + + /* Setup topology context */ + topology->is_loaded = 0; + topology->flags = 0; + topology->is_thissystem = 1; + topology->pid = 0; + + topology->support.discovery = malloc(sizeof(*topology->support.discovery)); + topology->support.cpubind = malloc(sizeof(*topology->support.cpubind)); + topology->support.membind = malloc(sizeof(*topology->support.membind)); + + /* Only ignore useless cruft by default */ + for(i = HWLOC_OBJ_SYSTEM; i < HWLOC_OBJ_TYPE_MAX; i++) + topology->ignored_types[i] = HWLOC_IGNORE_TYPE_NEVER; + topology->ignored_types[HWLOC_OBJ_GROUP] = HWLOC_IGNORE_TYPE_KEEP_STRUCTURE; + + hwloc_distances_init(topology); + + topology->userdata_export_cb = NULL; + topology->userdata_import_cb = NULL; + + /* Make the topology look like something coherent but empty */ + hwloc_topology_setup_defaults(topology); + + *topologyp = topology; + return 0; +} + +int +hwloc_topology_set_pid(struct hwloc_topology *topology __hwloc_attribute_unused, + hwloc_pid_t pid __hwloc_attribute_unused) +{ + /* this does *not* change the backend */ +#ifdef HWLOC_LINUX_SYS + topology->pid = pid; + return 0; +#else /* HWLOC_LINUX_SYS */ + errno = ENOSYS; + return -1; +#endif /* HWLOC_LINUX_SYS */ +} + +int +hwloc_topology_set_fsroot(struct hwloc_topology *topology, const char *fsroot_path) +{ + return hwloc_disc_component_force_enable(topology, + 0 /* api */, + HWLOC_DISC_COMPONENT_TYPE_CPU, "linux", + fsroot_path, NULL, NULL); +} + +int +hwloc_topology_set_synthetic(struct hwloc_topology *topology, const char *description) +{ + return hwloc_disc_component_force_enable(topology, + 0 /* api */, + -1, "synthetic", + description, NULL, NULL); +} + +int +hwloc_topology_set_xml(struct hwloc_topology *topology, + const char *xmlpath) +{ + return hwloc_disc_component_force_enable(topology, + 0 /* api */, + -1, "xml", + xmlpath, NULL, NULL); +} + +int +hwloc_topology_set_xmlbuffer(struct hwloc_topology *topology, + const char *xmlbuffer, + int size) +{ + return hwloc_disc_component_force_enable(topology, + 0 /* api */, + -1, "xml", NULL, + xmlbuffer, (void*) (uintptr_t) size); +} + +int +hwloc_topology_set_custom(struct hwloc_topology *topology) +{ + return hwloc_disc_component_force_enable(topology, + 0 /* api */, + -1, "custom", + NULL, NULL, NULL); +} + +int +hwloc_topology_set_flags (struct hwloc_topology *topology, unsigned long flags) +{ + if (topology->is_loaded) { + /* actually harmless */ + errno = EBUSY; + return -1; + } + topology->flags = flags; + return 0; +} + +unsigned long +hwloc_topology_get_flags (struct hwloc_topology *topology) +{ + return topology->flags; +} + +int +hwloc_topology_ignore_type(struct hwloc_topology *topology, hwloc_obj_type_t type) +{ + if (type >= HWLOC_OBJ_TYPE_MAX) { + errno = EINVAL; + return -1; + } + + if (type == HWLOC_OBJ_PU) { + /* we need the PU level */ + errno = EINVAL; + return -1; + } else if (hwloc_obj_type_is_io(type)) { + /* I/O devices aren't in any level, use topology flags to ignore them */ + errno = EINVAL; + return -1; + } + + topology->ignored_types[type] = HWLOC_IGNORE_TYPE_ALWAYS; + return 0; +} + +int +hwloc_topology_ignore_type_keep_structure(struct hwloc_topology *topology, hwloc_obj_type_t type) +{ + if (type >= HWLOC_OBJ_TYPE_MAX) { + errno = EINVAL; + return -1; + } + + if (type == HWLOC_OBJ_PU) { + /* we need the PU level */ + errno = EINVAL; + return -1; + } else if (hwloc_obj_type_is_io(type)) { + /* I/O devices aren't in any level, use topology flags to ignore them */ + errno = EINVAL; + return -1; + } + + topology->ignored_types[type] = HWLOC_IGNORE_TYPE_KEEP_STRUCTURE; + return 0; +} + +int +hwloc_topology_ignore_all_keep_structure(struct hwloc_topology *topology) +{ + unsigned type; + for(type = HWLOC_OBJ_SYSTEM; type < HWLOC_OBJ_TYPE_MAX; type++) + if (type != HWLOC_OBJ_PU + && !hwloc_obj_type_is_io((hwloc_obj_type_t) type)) + topology->ignored_types[type] = HWLOC_IGNORE_TYPE_KEEP_STRUCTURE; + return 0; +} + +/* traverse the tree and free everything. + * only use first_child/next_sibling so that it works before load() + * and may be used when switching between backend. + */ +static void +hwloc_topology_clear_tree (struct hwloc_topology *topology, struct hwloc_obj *root) +{ + hwloc_obj_t child = root->first_child; + while (child) { + hwloc_obj_t nextchild = child->next_sibling; + hwloc_topology_clear_tree (topology, child); + child = nextchild; + } + hwloc_free_unlinked_object (root); +} + +void +hwloc_topology_clear (struct hwloc_topology *topology) +{ + unsigned l; + hwloc_topology_clear_tree (topology, topology->levels[0][0]); + for (l=0; lnb_levels; l++) { + free(topology->levels[l]); + topology->levels[l] = NULL; + } + free(topology->bridge_level); + free(topology->pcidev_level); + free(topology->osdev_level); +} + +void +hwloc_topology_destroy (struct hwloc_topology *topology) +{ + hwloc_backends_disable_all(topology); + hwloc_components_destroy_all(topology); + + hwloc_topology_clear(topology); + hwloc_distances_destroy(topology); + + free(topology->support.discovery); + free(topology->support.cpubind); + free(topology->support.membind); + free(topology); +} + +int +hwloc_topology_load (struct hwloc_topology *topology) +{ + int err; + + if (topology->is_loaded) { + errno = EBUSY; + return -1; + } + + /* enforce backend anyway if a FORCE variable was given */ + { + char *fsroot_path_env = getenv("HWLOC_FORCE_FSROOT"); + if (fsroot_path_env) + hwloc_disc_component_force_enable(topology, + 1 /* env force */, + HWLOC_DISC_COMPONENT_TYPE_CPU, "linux", + fsroot_path_env, NULL, NULL); + } + { + char *xmlpath_env = getenv("HWLOC_FORCE_XMLFILE"); + if (xmlpath_env) + hwloc_disc_component_force_enable(topology, + 1 /* env force */, + -1, "xml", + xmlpath_env, NULL, NULL); + } + + /* only apply non-FORCE variables if we have not changed the backend yet */ + if (!topology->backends) { + char *fsroot_path_env = getenv("HWLOC_FSROOT"); + if (fsroot_path_env) + hwloc_disc_component_force_enable(topology, + 1 /* env force */, + HWLOC_DISC_COMPONENT_TYPE_CPU, "linux", + fsroot_path_env, NULL, NULL); + } + if (!topology->backends) { + char *xmlpath_env = getenv("HWLOC_XMLFILE"); + if (xmlpath_env) + hwloc_disc_component_force_enable(topology, + 1 /* env force */, + -1, "xml", + xmlpath_env, NULL, NULL); + } + + /* instantiate all possible other backends now */ + hwloc_disc_components_enable_others(topology); + /* now that backends are enabled, update the thissystem flag */ + hwloc_backends_is_thissystem(topology); + + /* get distance matrix from the environment are store them (as indexes) in the topology. + * indexes will be converted into objects later once the tree will be filled + */ + hwloc_distances_set_from_env(topology); + + /* actual topology discovery */ + err = hwloc_discover(topology); + if (err < 0) + goto out; + +#ifndef HWLOC_DEBUG + if (getenv("HWLOC_DEBUG_CHECK")) +#endif + hwloc_topology_check(topology); + + topology->is_loaded = 1; + return 0; + + out: + hwloc_topology_clear(topology); + hwloc_distances_destroy(topology); + hwloc_topology_setup_defaults(topology); + hwloc_backends_disable_all(topology); + return -1; +} + +int +hwloc_topology_restrict(struct hwloc_topology *topology, hwloc_const_cpuset_t cpuset, unsigned long flags) +{ + hwloc_bitmap_t droppedcpuset, droppednodeset; + + /* make sure we'll keep something in the topology */ + if (!hwloc_bitmap_intersects(cpuset, topology->levels[0][0]->cpuset)) { + errno = EINVAL; /* easy failure, just don't touch the topology */ + return -1; + } + + droppedcpuset = hwloc_bitmap_alloc(); + droppednodeset = hwloc_bitmap_alloc(); + + /* drop object based on the reverse of cpuset, and fill the 'dropped' nodeset */ + hwloc_bitmap_not(droppedcpuset, cpuset); + restrict_object(topology, flags, &topology->levels[0][0], droppedcpuset, droppednodeset, 0 /* root cannot be removed */); + /* update nodesets according to dropped nodeset */ + restrict_object_nodeset(topology, &topology->levels[0][0], droppednodeset); + + hwloc_bitmap_free(droppedcpuset); + hwloc_bitmap_free(droppednodeset); + + hwloc_connect_children(topology->levels[0][0]); + if (hwloc_connect_levels(topology) < 0) + goto out; + + propagate_total_memory(topology->levels[0][0]); + hwloc_distances_restrict(topology, flags); + hwloc_distances_finalize_os(topology); + hwloc_distances_finalize_logical(topology); + return 0; + + out: + /* unrecoverable failure, re-init the topology */ + hwloc_topology_clear(topology); + hwloc_distances_destroy(topology); + hwloc_topology_setup_defaults(topology); + return -1; +} + +int +hwloc_topology_is_thissystem(struct hwloc_topology *topology) +{ + return topology->is_thissystem; +} + +unsigned +hwloc_topology_get_depth(struct hwloc_topology *topology) +{ + return topology->nb_levels; +} + +/* check children between a parent object */ +static void +hwloc__check_children(struct hwloc_obj *parent) +{ + hwloc_bitmap_t remaining_parent_set; + unsigned j; + + if (!parent->arity) { + /* check whether that parent has no children for real */ + assert(!parent->children); + assert(!parent->first_child); + assert(!parent->last_child); + return; + } + /* check whether that parent has children for real */ + assert(parent->children); + assert(parent->first_child); + assert(parent->last_child); + + /* first child specific checks */ + assert(parent->first_child->sibling_rank == 0); + assert(parent->first_child == parent->children[0]); + assert(parent->first_child->prev_sibling == NULL); + + /* last child specific checks */ + assert(parent->last_child->sibling_rank == parent->arity-1); + assert(parent->last_child == parent->children[parent->arity-1]); + assert(parent->last_child->next_sibling == NULL); + + if (parent->cpuset) { + remaining_parent_set = hwloc_bitmap_dup(parent->cpuset); + for(j=0; jarity; j++) { + if (!parent->children[j]->cpuset) + continue; + /* check that child cpuset is included in the parent */ + assert(hwloc_bitmap_isincluded(parent->children[j]->cpuset, remaining_parent_set)); +#if !defined(NDEBUG) + /* check that children are correctly ordered (see below), empty ones may be anywhere */ + if (!hwloc_bitmap_iszero(parent->children[j]->cpuset)) { + int firstchild = hwloc_bitmap_first(parent->children[j]->cpuset); + int firstparent = hwloc_bitmap_first(remaining_parent_set); + assert(firstchild == firstparent); + } +#endif + /* clear previously used parent cpuset bits so that we actually checked above + * that children cpusets do not intersect and are ordered properly + */ + hwloc_bitmap_andnot(remaining_parent_set, remaining_parent_set, parent->children[j]->cpuset); + } + assert(hwloc_bitmap_iszero(remaining_parent_set)); + hwloc_bitmap_free(remaining_parent_set); + } + + /* checks for all children */ + for(j=1; jarity; j++) { + assert(parent->children[j]->parent == parent); + assert(parent->children[j]->sibling_rank == j); + assert(parent->children[j-1]->next_sibling == parent->children[j]); + assert(parent->children[j]->prev_sibling == parent->children[j-1]); + } +} + +static void +hwloc__check_children_depth(struct hwloc_topology *topology, struct hwloc_obj *parent) +{ + hwloc_obj_t child = NULL; + while ((child = hwloc_get_next_child(topology, parent, child)) != NULL) { + if (child->type == HWLOC_OBJ_BRIDGE) + assert(child->depth == (unsigned) HWLOC_TYPE_DEPTH_BRIDGE); + else if (child->type == HWLOC_OBJ_PCI_DEVICE) + assert(child->depth == (unsigned) HWLOC_TYPE_DEPTH_PCI_DEVICE); + else if (child->type == HWLOC_OBJ_OS_DEVICE) + assert(child->depth == (unsigned) HWLOC_TYPE_DEPTH_OS_DEVICE); + else if (child->type == HWLOC_OBJ_MISC) + assert(child->depth == (unsigned) -1); + else if (parent->depth != (unsigned) -1) + assert(child->depth > parent->depth); + hwloc__check_children_depth(topology, child); + } +} + +/* check a whole topology structure */ +void +hwloc_topology_check(struct hwloc_topology *topology) +{ + struct hwloc_obj *obj; + hwloc_obj_type_t type; + unsigned i, j, depth; + + /* check type orders */ + for (type = HWLOC_OBJ_SYSTEM; type < HWLOC_OBJ_TYPE_MAX; type++) { + assert(hwloc_get_order_type(hwloc_get_type_order(type)) == type); + } + for (i = hwloc_get_type_order(HWLOC_OBJ_SYSTEM); + i <= hwloc_get_type_order(HWLOC_OBJ_CORE); i++) { + assert(i == hwloc_get_type_order(hwloc_get_order_type(i))); + } + + /* check that last level is PU */ + assert(hwloc_get_depth_type(topology, hwloc_topology_get_depth(topology)-1) == HWLOC_OBJ_PU); + /* check that other levels are not PU */ + for(i=1; iparent); + + depth = hwloc_topology_get_depth(topology); + + /* check each level */ + for(i=0; idepth == i); + assert(obj->logical_index == j); + /* check that all objects in the level have the same type */ + if (prev) { + assert(hwloc_type_cmp(obj, prev) == HWLOC_TYPE_EQUAL); + assert(prev->next_cousin == obj); + assert(obj->prev_cousin == prev); + } + if (obj->complete_cpuset) { + if (obj->cpuset) + assert(hwloc_bitmap_isincluded(obj->cpuset, obj->complete_cpuset)); + if (obj->online_cpuset) + assert(hwloc_bitmap_isincluded(obj->online_cpuset, obj->complete_cpuset)); + if (obj->allowed_cpuset) + assert(hwloc_bitmap_isincluded(obj->allowed_cpuset, obj->complete_cpuset)); + } + if (obj->complete_nodeset) { + if (obj->nodeset) + assert(hwloc_bitmap_isincluded(obj->nodeset, obj->complete_nodeset)); + if (obj->allowed_nodeset) + assert(hwloc_bitmap_isincluded(obj->allowed_nodeset, obj->complete_nodeset)); + } + /* check children */ + hwloc__check_children(obj); + prev = obj; + } + + /* check first object of the level */ + obj = hwloc_get_obj_by_depth(topology, i, 0); + assert(obj); + assert(!obj->prev_cousin); + + /* check type */ + assert(hwloc_get_depth_type(topology, i) == obj->type); + assert(i == (unsigned) hwloc_get_type_depth(topology, obj->type) || + HWLOC_TYPE_DEPTH_MULTIPLE == hwloc_get_type_depth(topology, obj->type)); + + /* check last object of the level */ + obj = hwloc_get_obj_by_depth(topology, i, width-1); + assert(obj); + assert(!obj->next_cousin); + + /* check last+1 object of the level */ + obj = hwloc_get_obj_by_depth(topology, i, width); + assert(!obj); + } + + /* check bottom objects */ + assert(hwloc_get_nbobjs_by_depth(topology, depth-1) > 0); + for(j=0; jtype == HWLOC_OBJ_PU); + } + + /* check relative depths */ + obj = hwloc_get_root_obj(topology); + assert(obj->depth == 0); + hwloc__check_children_depth(topology, obj); +} + +const struct hwloc_topology_support * +hwloc_topology_get_support(struct hwloc_topology * topology) +{ + return &topology->support; +} diff --git a/ext/hwloc/src/traversal.c b/ext/hwloc/src/traversal.c new file mode 100644 index 000000000..40f1263e4 --- /dev/null +++ b/ext/hwloc/src/traversal.c @@ -0,0 +1,623 @@ +/* + * Copyright © 2009 CNRS + * Copyright © 2009-2012 inria. All rights reserved. + * Copyright © 2009-2010 Université Bordeaux 1 + * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + */ + +#include +#include +#include +#include +#include +#ifdef HAVE_STRINGS_H +#include +#endif /* HAVE_STRINGS_H */ + +int +hwloc_get_type_depth (struct hwloc_topology *topology, hwloc_obj_type_t type) +{ + return topology->type_depth[type]; +} + +hwloc_obj_type_t +hwloc_get_depth_type (hwloc_topology_t topology, unsigned depth) +{ + if (depth >= topology->nb_levels) + switch (depth) { + case HWLOC_TYPE_DEPTH_BRIDGE: + return HWLOC_OBJ_BRIDGE; + case HWLOC_TYPE_DEPTH_PCI_DEVICE: + return HWLOC_OBJ_PCI_DEVICE; + case HWLOC_TYPE_DEPTH_OS_DEVICE: + return HWLOC_OBJ_OS_DEVICE; + default: + return (hwloc_obj_type_t) -1; + } + return topology->levels[depth][0]->type; +} + +unsigned +hwloc_get_nbobjs_by_depth (struct hwloc_topology *topology, unsigned depth) +{ + if (depth >= topology->nb_levels) + switch (depth) { + case HWLOC_TYPE_DEPTH_BRIDGE: + return topology->bridge_nbobjects; + case HWLOC_TYPE_DEPTH_PCI_DEVICE: + return topology->pcidev_nbobjects; + case HWLOC_TYPE_DEPTH_OS_DEVICE: + return topology->osdev_nbobjects; + default: + return 0; + } + return topology->level_nbobjects[depth]; +} + +struct hwloc_obj * +hwloc_get_obj_by_depth (struct hwloc_topology *topology, unsigned depth, unsigned idx) +{ + if (depth >= topology->nb_levels) + switch (depth) { + case HWLOC_TYPE_DEPTH_BRIDGE: + return idx < topology->bridge_nbobjects ? topology->bridge_level[idx] : NULL; + case HWLOC_TYPE_DEPTH_PCI_DEVICE: + return idx < topology->pcidev_nbobjects ? topology->pcidev_level[idx] : NULL; + case HWLOC_TYPE_DEPTH_OS_DEVICE: + return idx < topology->osdev_nbobjects ? topology->osdev_level[idx] : NULL; + default: + return NULL; + } + if (idx >= topology->level_nbobjects[depth]) + return NULL; + return topology->levels[depth][idx]; +} + +unsigned hwloc_get_closest_objs (struct hwloc_topology *topology, struct hwloc_obj *src, struct hwloc_obj **objs, unsigned max) +{ + struct hwloc_obj *parent, *nextparent, **src_objs; + int i,src_nbobjects; + unsigned stored = 0; + + if (!src->cpuset) + return 0; + + src_nbobjects = topology->level_nbobjects[src->depth]; + src_objs = topology->levels[src->depth]; + + parent = src; + while (stored < max) { + while (1) { + nextparent = parent->parent; + if (!nextparent) + goto out; + if (!nextparent->cpuset || !hwloc_bitmap_isequal(parent->cpuset, nextparent->cpuset)) + break; + parent = nextparent; + } + + if (!nextparent->cpuset) + break; + + /* traverse src's objects and find those that are in nextparent and were not in parent */ + for(i=0; icpuset, nextparent->cpuset) + && !hwloc_bitmap_isincluded(src_objs[i]->cpuset, parent->cpuset)) { + objs[stored++] = src_objs[i]; + if (stored == max) + goto out; + } + } + parent = nextparent; + } + + out: + return stored; +} + +static int +hwloc__get_largest_objs_inside_cpuset (struct hwloc_obj *current, hwloc_const_bitmap_t set, + struct hwloc_obj ***res, int *max) +{ + int gotten = 0; + unsigned i; + + /* the caller must ensure this */ + if (*max <= 0) + return 0; + + if (hwloc_bitmap_isequal(current->cpuset, set)) { + **res = current; + (*res)++; + (*max)--; + return 1; + } + + for (i=0; iarity; i++) { + hwloc_bitmap_t subset = hwloc_bitmap_dup(set); + int ret; + + /* split out the cpuset part corresponding to this child and see if there's anything to do */ + if (current->children[i]->cpuset) { + hwloc_bitmap_and(subset, subset, current->children[i]->cpuset); + if (hwloc_bitmap_iszero(subset)) { + hwloc_bitmap_free(subset); + continue; + } + } + + ret = hwloc__get_largest_objs_inside_cpuset (current->children[i], subset, res, max); + gotten += ret; + hwloc_bitmap_free(subset); + + /* if no more room to store remaining objects, return what we got so far */ + if (!*max) + break; + } + + return gotten; +} + +int +hwloc_get_largest_objs_inside_cpuset (struct hwloc_topology *topology, hwloc_const_bitmap_t set, + struct hwloc_obj **objs, int max) +{ + struct hwloc_obj *current = topology->levels[0][0]; + + if (!current->cpuset || !hwloc_bitmap_isincluded(set, current->cpuset)) + return -1; + + if (max <= 0) + return 0; + + return hwloc__get_largest_objs_inside_cpuset (current, set, &objs, &max); +} + +const char * +hwloc_obj_type_string (hwloc_obj_type_t obj) +{ + switch (obj) + { + case HWLOC_OBJ_SYSTEM: return "System"; + case HWLOC_OBJ_MACHINE: return "Machine"; + case HWLOC_OBJ_MISC: return "Misc"; + case HWLOC_OBJ_GROUP: return "Group"; + case HWLOC_OBJ_NODE: return "NUMANode"; + case HWLOC_OBJ_SOCKET: return "Socket"; + case HWLOC_OBJ_CACHE: return "Cache"; + case HWLOC_OBJ_CORE: return "Core"; + case HWLOC_OBJ_BRIDGE: return "Bridge"; + case HWLOC_OBJ_PCI_DEVICE: return "PCIDev"; + case HWLOC_OBJ_OS_DEVICE: return "OSDev"; + case HWLOC_OBJ_PU: return "PU"; + default: return "Unknown"; + } +} + +hwloc_obj_type_t +hwloc_obj_type_of_string (const char * string) +{ + if (!strcasecmp(string, "System")) return HWLOC_OBJ_SYSTEM; + if (!strcasecmp(string, "Machine")) return HWLOC_OBJ_MACHINE; + if (!strcasecmp(string, "Misc")) return HWLOC_OBJ_MISC; + if (!strcasecmp(string, "Group")) return HWLOC_OBJ_GROUP; + if (!strcasecmp(string, "NUMANode") || !strcasecmp(string, "Node")) return HWLOC_OBJ_NODE; + if (!strcasecmp(string, "Socket")) return HWLOC_OBJ_SOCKET; + if (!strcasecmp(string, "Cache")) return HWLOC_OBJ_CACHE; + if (!strcasecmp(string, "Core")) return HWLOC_OBJ_CORE; + if (!strcasecmp(string, "PU")) return HWLOC_OBJ_PU; + if (!strcasecmp(string, "Bridge")) return HWLOC_OBJ_BRIDGE; + if (!strcasecmp(string, "PCIDev")) return HWLOC_OBJ_PCI_DEVICE; + if (!strcasecmp(string, "OSDev")) return HWLOC_OBJ_OS_DEVICE; + return (hwloc_obj_type_t) -1; +} + +static const char * +hwloc_pci_class_string(unsigned short class_id) +{ + switch ((class_id & 0xff00) >> 8) { + case 0x00: + switch (class_id) { + case 0x0001: return "VGA"; + } + return "PCI"; + case 0x01: + switch (class_id) { + case 0x0100: return "SCSI"; + case 0x0101: return "IDE"; + case 0x0102: return "Flop"; + case 0x0103: return "IPI"; + case 0x0104: return "RAID"; + case 0x0105: return "ATA"; + case 0x0106: return "SATA"; + case 0x0107: return "SAS"; + } + return "Stor"; + case 0x02: + switch (class_id) { + case 0x0200: return "Ether"; + case 0x0201: return "TokRn"; + case 0x0202: return "FDDI"; + case 0x0203: return "ATM"; + case 0x0204: return "ISDN"; + case 0x0205: return "WrdFip"; + case 0x0206: return "PICMG"; + } + return "Net"; + case 0x03: + switch (class_id) { + case 0x0300: return "VGA"; + case 0x0301: return "XGA"; + case 0x0302: return "3D"; + } + return "Disp"; + case 0x04: + switch (class_id) { + case 0x0400: return "Video"; + case 0x0401: return "Audio"; + case 0x0402: return "Phone"; + case 0x0403: return "Auddv"; + } + return "MM"; + case 0x05: + switch (class_id) { + case 0x0500: return "RAM"; + case 0x0501: return "Flash"; + } + return "Mem"; + case 0x06: + switch (class_id) { + case 0x0600: return "Host"; + case 0x0601: return "ISA"; + case 0x0602: return "EISA"; + case 0x0603: return "MC"; + case 0x0604: return "PCI_B"; + case 0x0605: return "PCMCIA"; + case 0x0606: return "Nubus"; + case 0x0607: return "CardBus"; + case 0x0608: return "RACEway"; + case 0x0609: return "PCI_SB"; + case 0x060a: return "IB_B"; + } + return "Bridg"; + case 0x07: + switch (class_id) { + case 0x0700: return "Ser"; + case 0x0701: return "Para"; + case 0x0702: return "MSer"; + case 0x0703: return "Modm"; + case 0x0704: return "GPIB"; + case 0x0705: return "SmrtCrd"; + } + return "Comm"; + case 0x08: + switch (class_id) { + case 0x0800: return "PIC"; + case 0x0801: return "DMA"; + case 0x0802: return "Time"; + case 0x0803: return "RTC"; + case 0x0804: return "HtPl"; + case 0x0805: return "SD-HtPl"; + } + return "Syst"; + case 0x09: + switch (class_id) { + case 0x0900: return "Kbd"; + case 0x0901: return "Pen"; + case 0x0902: return "Mouse"; + case 0x0903: return "Scan"; + case 0x0904: return "Game"; + } + return "In"; + case 0x0a: + return "Dock"; + case 0x0b: + switch (class_id) { + case 0x0b00: return "386"; + case 0x0b01: return "486"; + case 0x0b02: return "Pent"; + case 0x0b10: return "Alpha"; + case 0x0b20: return "PPC"; + case 0x0b30: return "MIPS"; + case 0x0b40: return "CoProc"; + } + return "Proc"; + case 0x0c: + switch (class_id) { + case 0x0c00: return "Firw"; + case 0x0c01: return "ACCES"; + case 0x0c02: return "SSA"; + case 0x0c03: return "USB"; + case 0x0c04: return "Fiber"; + case 0x0c05: return "SMBus"; + case 0x0c06: return "IB"; + case 0x0c07: return "IPMI"; + case 0x0c08: return "SERCOS"; + case 0x0c09: return "CANBUS"; + } + return "Ser"; + case 0x0d: + switch (class_id) { + case 0x0d00: return "IRDA"; + case 0x0d01: return "IR"; + case 0x0d10: return "RF"; + case 0x0d11: return "Blueth"; + case 0x0d12: return "BroadB"; + case 0x0d20: return "802.1a"; + case 0x0d21: return "802.1b"; + } + return "Wifi"; + case 0x0e: + switch (class_id) { + case 0x0e00: return "I2O"; + } + return "Intll"; + case 0x0f: + switch (class_id) { + case 0x0f00: return "S-TV"; + case 0x0f01: return "S-Aud"; + case 0x0f02: return "S-Voice"; + case 0x0f03: return "S-Data"; + } + return "Satel"; + case 0x10: + return "Crypt"; + case 0x11: + return "Signl"; + case 0xff: + return "Oth"; + } + return "PCI"; +} + +#define hwloc_memory_size_printf_value(_size, _verbose) \ + ((_size) < (10ULL<<20) || _verbose ? (((_size)>>9)+1)>>1 : (_size) < (10ULL<<30) ? (((_size)>>19)+1)>>1 : (((_size)>>29)+1)>>1) +#define hwloc_memory_size_printf_unit(_size, _verbose) \ + ((_size) < (10ULL<<20) || _verbose ? "KB" : (_size) < (10ULL<<30) ? "MB" : "GB") + +static const char* hwloc_obj_cache_type_letter(hwloc_obj_cache_type_t type) +{ + switch (type) { + case HWLOC_OBJ_CACHE_UNIFIED: return ""; + case HWLOC_OBJ_CACHE_DATA: return "d"; + case HWLOC_OBJ_CACHE_INSTRUCTION: return "i"; + default: return "unknown"; + } +} + +int +hwloc_obj_type_snprintf(char * __hwloc_restrict string, size_t size, hwloc_obj_t obj, int verbose) +{ + hwloc_obj_type_t type = obj->type; + switch (type) { + case HWLOC_OBJ_MISC: + case HWLOC_OBJ_SYSTEM: + case HWLOC_OBJ_MACHINE: + case HWLOC_OBJ_NODE: + case HWLOC_OBJ_SOCKET: + case HWLOC_OBJ_CORE: + case HWLOC_OBJ_PU: + return hwloc_snprintf(string, size, "%s", hwloc_obj_type_string(type)); + case HWLOC_OBJ_CACHE: + return hwloc_snprintf(string, size, "L%u%s%s", obj->attr->cache.depth, + hwloc_obj_cache_type_letter(obj->attr->cache.type), + verbose ? hwloc_obj_type_string(type): ""); + case HWLOC_OBJ_GROUP: + /* TODO: more pretty presentation? */ + if (obj->attr->group.depth != (unsigned) -1) + return hwloc_snprintf(string, size, "%s%u", hwloc_obj_type_string(type), obj->attr->group.depth); + else + return hwloc_snprintf(string, size, "%s", hwloc_obj_type_string(type)); + case HWLOC_OBJ_BRIDGE: + if (verbose) + return snprintf(string, size, "Bridge %s->%s", + obj->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_PCI ? "PCI" : "Host", + "PCI"); + else + return snprintf(string, size, obj->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_PCI ? "PCIBridge" : "HostBridge"); + case HWLOC_OBJ_PCI_DEVICE: + return snprintf(string, size, "PCI %04x:%04x", + obj->attr->pcidev.vendor_id, obj->attr->pcidev.device_id); + case HWLOC_OBJ_OS_DEVICE: + switch (obj->attr->osdev.type) { + case HWLOC_OBJ_OSDEV_BLOCK: return hwloc_snprintf(string, size, "Block"); + case HWLOC_OBJ_OSDEV_NETWORK: return hwloc_snprintf(string, size, verbose ? "Network" : "Net"); + case HWLOC_OBJ_OSDEV_OPENFABRICS: return hwloc_snprintf(string, size, "OpenFabrics"); + case HWLOC_OBJ_OSDEV_DMA: return hwloc_snprintf(string, size, "DMA"); + case HWLOC_OBJ_OSDEV_GPU: return hwloc_snprintf(string, size, "GPU"); + case HWLOC_OBJ_OSDEV_COPROC: return hwloc_snprintf(string, size, verbose ? "Co-Processor" : "CoProc"); + default: + *string = '\0'; + return 0; + } + break; + default: + if (size > 0) + *string = '\0'; + return 0; + } +} + +int +hwloc_obj_attr_snprintf(char * __hwloc_restrict string, size_t size, hwloc_obj_t obj, const char * separator, int verbose) +{ + const char *prefix = ""; + char *tmp = string; + ssize_t tmplen = size; + int ret = 0; + int res; + + /* make sure we output at least an empty string */ + if (size) + *string = '\0'; + + /* print memory attributes */ + res = 0; + if (verbose) { + if (obj->memory.local_memory) + res = hwloc_snprintf(tmp, tmplen, "%slocal=%lu%s%stotal=%lu%s", + prefix, + (unsigned long) hwloc_memory_size_printf_value(obj->memory.total_memory, verbose), + hwloc_memory_size_printf_unit(obj->memory.total_memory, verbose), + separator, + (unsigned long) hwloc_memory_size_printf_value(obj->memory.local_memory, verbose), + hwloc_memory_size_printf_unit(obj->memory.local_memory, verbose)); + else if (obj->memory.total_memory) + res = hwloc_snprintf(tmp, tmplen, "%stotal=%lu%s", + prefix, + (unsigned long) hwloc_memory_size_printf_value(obj->memory.total_memory, verbose), + hwloc_memory_size_printf_unit(obj->memory.total_memory, verbose)); + } else { + if (obj->memory.total_memory) + res = hwloc_snprintf(tmp, tmplen, "%s%lu%s", + prefix, + (unsigned long) hwloc_memory_size_printf_value(obj->memory.total_memory, verbose), + hwloc_memory_size_printf_unit(obj->memory.total_memory, verbose)); + } + if (res < 0) + return -1; + ret += res; + if (ret > 0) + prefix = separator; + if (res >= tmplen) + res = tmplen>0 ? tmplen - 1 : 0; + tmp += res; + tmplen -= res; + + /* printf type-specific attributes */ + res = 0; + switch (obj->type) { + case HWLOC_OBJ_CACHE: + if (verbose) { + char assoc[32]; + if (obj->attr->cache.associativity == -1) + snprintf(assoc, sizeof(assoc), "%sfully-associative", separator); + else if (obj->attr->cache.associativity == 0) + *assoc = '\0'; + else + snprintf(assoc, sizeof(assoc), "%sways=%d", separator, obj->attr->cache.associativity); + res = hwloc_snprintf(tmp, tmplen, "%ssize=%lu%s%slinesize=%u%s", + prefix, + (unsigned long) hwloc_memory_size_printf_value(obj->attr->cache.size, verbose), + hwloc_memory_size_printf_unit(obj->attr->cache.size, verbose), + separator, obj->attr->cache.linesize, + assoc); + } else + res = hwloc_snprintf(tmp, tmplen, "%s%lu%s", + prefix, + (unsigned long) hwloc_memory_size_printf_value(obj->attr->cache.size, verbose), + hwloc_memory_size_printf_unit(obj->attr->cache.size, verbose)); + break; + case HWLOC_OBJ_BRIDGE: + if (verbose) { + char up[128], down[64]; + /* upstream is PCI or HOST */ + if (obj->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_PCI) { + char linkspeed[64]= ""; + if (obj->attr->pcidev.linkspeed) + snprintf(linkspeed, sizeof(linkspeed), "%slink=%.2fGB/s", separator, obj->attr->pcidev.linkspeed); + snprintf(up, sizeof(up), "busid=%04x:%02x:%02x.%01x%sid=%04x:%04x%sclass=%04x(%s)%s", + obj->attr->pcidev.domain, obj->attr->pcidev.bus, obj->attr->pcidev.dev, obj->attr->pcidev.func, separator, + obj->attr->pcidev.vendor_id, obj->attr->pcidev.device_id, separator, + obj->attr->pcidev.class_id, hwloc_pci_class_string(obj->attr->pcidev.class_id), linkspeed); + } else + *up = '\0'; + /* downstream is_PCI */ + snprintf(down, sizeof(down), "buses=%04x:[%02x-%02x]", + obj->attr->bridge.downstream.pci.domain, obj->attr->bridge.downstream.pci.secondary_bus, obj->attr->bridge.downstream.pci.subordinate_bus); + if (*up) + res = snprintf(string, size, "%s%s%s", up, separator, down); + else + res = snprintf(string, size, "%s", down); + } + break; + case HWLOC_OBJ_PCI_DEVICE: + if (verbose) { + char linkspeed[64]= ""; + if (obj->attr->pcidev.linkspeed) + snprintf(linkspeed, sizeof(linkspeed), "%slink=%.2fGB/s", separator, obj->attr->pcidev.linkspeed); + res = snprintf(string, size, "busid=%04x:%02x:%02x.%01x%sclass=%04x(%s)%s", + obj->attr->pcidev.domain, obj->attr->pcidev.bus, obj->attr->pcidev.dev, obj->attr->pcidev.func, separator, + obj->attr->pcidev.class_id, hwloc_pci_class_string(obj->attr->pcidev.class_id), linkspeed); + } + break; + default: + break; + } + if (res < 0) + return -1; + ret += res; + if (ret > 0) + prefix = separator; + if (res >= tmplen) + res = tmplen>0 ? tmplen - 1 : 0; + tmp += res; + tmplen -= res; + + /* printf infos */ + if (verbose) { + unsigned i; + for(i=0; iinfos_count; i++) { + if (strchr(obj->infos[i].value, ' ')) + res = hwloc_snprintf(tmp, tmplen, "%s%s=\"%s\"", + prefix, + obj->infos[i].name, obj->infos[i].value); + else + res = hwloc_snprintf(tmp, tmplen, "%s%s=%s", + prefix, + obj->infos[i].name, obj->infos[i].value); + if (res < 0) + return -1; + ret += res; + if (res >= tmplen) + res = tmplen>0 ? tmplen - 1 : 0; + tmp += res; + tmplen -= res; + if (ret > 0) + prefix = separator; + } + } + + return ret; +} + + +int +hwloc_obj_snprintf(char *string, size_t size, + struct hwloc_topology *topology __hwloc_attribute_unused, struct hwloc_obj *l, const char *_indexprefix, int verbose) +{ + const char *indexprefix = _indexprefix ? _indexprefix : "#"; + char os_index[12] = ""; + char type[64]; + char attr[128]; + int attrlen; + + if (l->os_index != (unsigned) -1) { + hwloc_snprintf(os_index, 12, "%s%u", indexprefix, l->os_index); + } + + hwloc_obj_type_snprintf(type, sizeof(type), l, verbose); + attrlen = hwloc_obj_attr_snprintf(attr, sizeof(attr), l, " ", verbose); + + if (attrlen > 0) + return hwloc_snprintf(string, size, "%s%s(%s)", type, os_index, attr); + else + return hwloc_snprintf(string, size, "%s%s", type, os_index); +} + +int hwloc_obj_cpuset_snprintf(char *str, size_t size, size_t nobj, struct hwloc_obj * const *objs) +{ + hwloc_bitmap_t set = hwloc_bitmap_alloc(); + int res; + unsigned i; + + hwloc_bitmap_zero(set); + for(i=0; icpuset) + hwloc_bitmap_or(set, set, objs[i]->cpuset); + + res = hwloc_bitmap_snprintf(str, size, set); + hwloc_bitmap_free(set); + return res; +} diff --git a/ext/lua/Makefile b/ext/lua/Makefile new file mode 100644 index 000000000..7f5b2ebd0 --- /dev/null +++ b/ext/lua/Makefile @@ -0,0 +1,49 @@ +SRC_DIRS = ./src +MAKE_DIR = ../../make + +#DO NOT EDIT BELOW + +include ../../config.mk +include $(MAKE_DIR)/include_$(COMPILER).mk + +CFLAGS = -O2 -Wall +INCLUDES = -I./includes +DEFINES = -DLUA_COMPAT_ALL -DLUA_USE_LINUX +LIBS = -lm -Wl,-E -ldl -lreadline +LFLAGS = +Q ?= @ + +#CONFIGURE BUILD SYSTEM +BUILD_DIR = ./$(COMPILER) + +VPATH = $(SRC_DIRS) +FILES = $(notdir $(foreach dir,$(SRC_DIRS),$(wildcard $(dir)/*.c))) +OBJ = $(patsubst %.c, $(BUILD_DIR)/%.o, $(FILES)) + +CPPFLAGS := $(CPPFLAGS) $(DEFINES) $(INCLUDES) + +all: $(BUILD_DIR) $(OBJ) + +$(BUILD_DIR): + @mkdir $(BUILD_DIR) + + +#PATTERN RULES +$(BUILD_DIR)/%.o: %.c + ${Q}$(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@ + ${Q}$(CC) $(CPPFLAGS) -MT $(@:.d=.o) -MM $< > $(BUILD_DIR)/$*.d + +ifeq ($(findstring $(MAKECMDGOALS),clean),) +-include $(OBJ:.o=.d) +endif + +.PHONY: clean distclean + +clean: + @rm -rf $(BUILD_DIR) + +distclean: clean + @rm -f $(TARGET) + + + diff --git a/ext/lua/includes/lapi.h b/ext/lua/includes/lapi.h new file mode 100644 index 000000000..0909a3911 --- /dev/null +++ b/ext/lua/includes/lapi.h @@ -0,0 +1,24 @@ +/* +** $Id: lapi.h,v 2.7 2009/11/27 15:37:59 roberto Exp $ +** Auxiliary functions from Lua API +** See Copyright Notice in lua.h +*/ + +#ifndef lapi_h +#define lapi_h + + +#include "llimits.h" +#include "lstate.h" + +#define api_incr_top(L) {L->top++; api_check(L, L->top <= L->ci->top, \ + "stack overflow");} + +#define adjustresults(L,nres) \ + { if ((nres) == LUA_MULTRET && L->ci->top < L->top) L->ci->top = L->top; } + +#define api_checknelems(L,n) api_check(L, (n) < (L->top - L->ci->func), \ + "not enough elements in the stack") + + +#endif diff --git a/ext/lua/includes/lauxlib.h b/ext/lua/includes/lauxlib.h new file mode 100644 index 000000000..ac4d15fbb --- /dev/null +++ b/ext/lua/includes/lauxlib.h @@ -0,0 +1,212 @@ +/* +** $Id: lauxlib.h,v 1.120 2011/11/29 15:55:08 roberto Exp $ +** Auxiliary functions for building Lua libraries +** See Copyright Notice in lua.h +*/ + + +#ifndef lauxlib_h +#define lauxlib_h + + +#include +#include + +#include "lua.h" + + + +/* extra error code for `luaL_load' */ +#define LUA_ERRFILE (LUA_ERRERR+1) + + +typedef struct luaL_Reg { + const char *name; + lua_CFunction func; +} luaL_Reg; + + +LUALIB_API void (luaL_checkversion_) (lua_State *L, lua_Number ver); +#define luaL_checkversion(L) luaL_checkversion_(L, LUA_VERSION_NUM) + +LUALIB_API int (luaL_getmetafield) (lua_State *L, int obj, const char *e); +LUALIB_API int (luaL_callmeta) (lua_State *L, int obj, const char *e); +LUALIB_API const char *(luaL_tolstring) (lua_State *L, int idx, size_t *len); +LUALIB_API int (luaL_argerror) (lua_State *L, int numarg, const char *extramsg); +LUALIB_API const char *(luaL_checklstring) (lua_State *L, int numArg, + size_t *l); +LUALIB_API const char *(luaL_optlstring) (lua_State *L, int numArg, + const char *def, size_t *l); +LUALIB_API lua_Number (luaL_checknumber) (lua_State *L, int numArg); +LUALIB_API lua_Number (luaL_optnumber) (lua_State *L, int nArg, lua_Number def); + +LUALIB_API lua_Integer (luaL_checkinteger) (lua_State *L, int numArg); +LUALIB_API lua_Integer (luaL_optinteger) (lua_State *L, int nArg, + lua_Integer def); +LUALIB_API lua_Unsigned (luaL_checkunsigned) (lua_State *L, int numArg); +LUALIB_API lua_Unsigned (luaL_optunsigned) (lua_State *L, int numArg, + lua_Unsigned def); + +LUALIB_API void (luaL_checkstack) (lua_State *L, int sz, const char *msg); +LUALIB_API void (luaL_checktype) (lua_State *L, int narg, int t); +LUALIB_API void (luaL_checkany) (lua_State *L, int narg); + +LUALIB_API int (luaL_newmetatable) (lua_State *L, const char *tname); +LUALIB_API void (luaL_setmetatable) (lua_State *L, const char *tname); +LUALIB_API void *(luaL_testudata) (lua_State *L, int ud, const char *tname); +LUALIB_API void *(luaL_checkudata) (lua_State *L, int ud, const char *tname); + +LUALIB_API void (luaL_where) (lua_State *L, int lvl); +LUALIB_API int (luaL_error) (lua_State *L, const char *fmt, ...); + +LUALIB_API int (luaL_checkoption) (lua_State *L, int narg, const char *def, + const char *const lst[]); + +LUALIB_API int (luaL_fileresult) (lua_State *L, int stat, const char *fname); +LUALIB_API int (luaL_execresult) (lua_State *L, int stat); + +/* pre-defined references */ +#define LUA_NOREF (-2) +#define LUA_REFNIL (-1) + +LUALIB_API int (luaL_ref) (lua_State *L, int t); +LUALIB_API void (luaL_unref) (lua_State *L, int t, int ref); + +LUALIB_API int (luaL_loadfilex) (lua_State *L, const char *filename, + const char *mode); + +#define luaL_loadfile(L,f) luaL_loadfilex(L,f,NULL) + +LUALIB_API int (luaL_loadbufferx) (lua_State *L, const char *buff, size_t sz, + const char *name, const char *mode); +LUALIB_API int (luaL_loadstring) (lua_State *L, const char *s); + +LUALIB_API lua_State *(luaL_newstate) (void); + +LUALIB_API int (luaL_len) (lua_State *L, int idx); + +LUALIB_API const char *(luaL_gsub) (lua_State *L, const char *s, const char *p, + const char *r); + +LUALIB_API void (luaL_setfuncs) (lua_State *L, const luaL_Reg *l, int nup); + +LUALIB_API int (luaL_getsubtable) (lua_State *L, int idx, const char *fname); + +LUALIB_API void (luaL_traceback) (lua_State *L, lua_State *L1, + const char *msg, int level); + +LUALIB_API void (luaL_requiref) (lua_State *L, const char *modname, + lua_CFunction openf, int glb); + +/* +** =============================================================== +** some useful macros +** =============================================================== +*/ + + +#define luaL_newlibtable(L,l) \ + lua_createtable(L, 0, sizeof(l)/sizeof((l)[0]) - 1) + +#define luaL_newlib(L,l) (luaL_newlibtable(L,l), luaL_setfuncs(L,l,0)) + +#define luaL_argcheck(L, cond,numarg,extramsg) \ + ((void)((cond) || luaL_argerror(L, (numarg), (extramsg)))) +#define luaL_checkstring(L,n) (luaL_checklstring(L, (n), NULL)) +#define luaL_optstring(L,n,d) (luaL_optlstring(L, (n), (d), NULL)) +#define luaL_checkint(L,n) ((int)luaL_checkinteger(L, (n))) +#define luaL_optint(L,n,d) ((int)luaL_optinteger(L, (n), (d))) +#define luaL_checklong(L,n) ((long)luaL_checkinteger(L, (n))) +#define luaL_optlong(L,n,d) ((long)luaL_optinteger(L, (n), (d))) + +#define luaL_typename(L,i) lua_typename(L, lua_type(L,(i))) + +#define luaL_dofile(L, fn) \ + (luaL_loadfile(L, fn) || lua_pcall(L, 0, LUA_MULTRET, 0)) + +#define luaL_dostring(L, s) \ + (luaL_loadstring(L, s) || lua_pcall(L, 0, LUA_MULTRET, 0)) + +#define luaL_getmetatable(L,n) (lua_getfield(L, LUA_REGISTRYINDEX, (n))) + +#define luaL_opt(L,f,n,d) (lua_isnoneornil(L,(n)) ? (d) : f(L,(n))) + +#define luaL_loadbuffer(L,s,sz,n) luaL_loadbufferx(L,s,sz,n,NULL) + + +/* +** {====================================================== +** Generic Buffer manipulation +** ======================================================= +*/ + +typedef struct luaL_Buffer { + char *b; /* buffer address */ + size_t size; /* buffer size */ + size_t n; /* number of characters in buffer */ + lua_State *L; + char initb[LUAL_BUFFERSIZE]; /* initial buffer */ +} luaL_Buffer; + + +#define luaL_addchar(B,c) \ + ((void)((B)->n < (B)->size || luaL_prepbuffsize((B), 1)), \ + ((B)->b[(B)->n++] = (c))) + +#define luaL_addsize(B,s) ((B)->n += (s)) + +LUALIB_API void (luaL_buffinit) (lua_State *L, luaL_Buffer *B); +LUALIB_API char *(luaL_prepbuffsize) (luaL_Buffer *B, size_t sz); +LUALIB_API void (luaL_addlstring) (luaL_Buffer *B, const char *s, size_t l); +LUALIB_API void (luaL_addstring) (luaL_Buffer *B, const char *s); +LUALIB_API void (luaL_addvalue) (luaL_Buffer *B); +LUALIB_API void (luaL_pushresult) (luaL_Buffer *B); +LUALIB_API void (luaL_pushresultsize) (luaL_Buffer *B, size_t sz); +LUALIB_API char *(luaL_buffinitsize) (lua_State *L, luaL_Buffer *B, size_t sz); + +#define luaL_prepbuffer(B) luaL_prepbuffsize(B, LUAL_BUFFERSIZE) + +/* }====================================================== */ + + + +/* +** {====================================================== +** File handles for IO library +** ======================================================= +*/ + +/* +** A file handle is a userdata with metatable 'LUA_FILEHANDLE' and +** initial structure 'luaL_Stream' (it may contain other fields +** after that initial structure). +*/ + +#define LUA_FILEHANDLE "FILE*" + + +typedef struct luaL_Stream { + FILE *f; /* stream (NULL for incompletely created streams) */ + lua_CFunction closef; /* to close stream (NULL for closed streams) */ +} luaL_Stream; + +/* }====================================================== */ + + + +/* compatibility with old module system */ +#if defined(LUA_COMPAT_MODULE) + +LUALIB_API void (luaL_pushmodule) (lua_State *L, const char *modname, + int sizehint); +LUALIB_API void (luaL_openlib) (lua_State *L, const char *libname, + const luaL_Reg *l, int nup); + +#define luaL_register(L,n,l) (luaL_openlib(L,(n),(l),0)) + +#endif + + +#endif + + diff --git a/ext/lua/includes/lcode.h b/ext/lua/includes/lcode.h new file mode 100644 index 000000000..5a1fa9fea --- /dev/null +++ b/ext/lua/includes/lcode.h @@ -0,0 +1,83 @@ +/* +** $Id: lcode.h,v 1.58 2011/08/30 16:26:41 roberto Exp $ +** Code generator for Lua +** See Copyright Notice in lua.h +*/ + +#ifndef lcode_h +#define lcode_h + +#include "llex.h" +#include "lobject.h" +#include "lopcodes.h" +#include "lparser.h" + + +/* +** Marks the end of a patch list. It is an invalid value both as an absolute +** address, and as a list link (would link an element to itself). +*/ +#define NO_JUMP (-1) + + +/* +** grep "ORDER OPR" if you change these enums (ORDER OP) +*/ +typedef enum BinOpr { + OPR_ADD, OPR_SUB, OPR_MUL, OPR_DIV, OPR_MOD, OPR_POW, + OPR_CONCAT, + OPR_EQ, OPR_LT, OPR_LE, + OPR_NE, OPR_GT, OPR_GE, + OPR_AND, OPR_OR, + OPR_NOBINOPR +} BinOpr; + + +typedef enum UnOpr { OPR_MINUS, OPR_NOT, OPR_LEN, OPR_NOUNOPR } UnOpr; + + +#define getcode(fs,e) ((fs)->f->code[(e)->u.info]) + +#define luaK_codeAsBx(fs,o,A,sBx) luaK_codeABx(fs,o,A,(sBx)+MAXARG_sBx) + +#define luaK_setmultret(fs,e) luaK_setreturns(fs, e, LUA_MULTRET) + +#define luaK_jumpto(fs,t) luaK_patchlist(fs, luaK_jump(fs), t) + +LUAI_FUNC int luaK_codeABx (FuncState *fs, OpCode o, int A, unsigned int Bx); +LUAI_FUNC int luaK_codeABC (FuncState *fs, OpCode o, int A, int B, int C); +LUAI_FUNC int luaK_codek (FuncState *fs, int reg, int k); +LUAI_FUNC void luaK_fixline (FuncState *fs, int line); +LUAI_FUNC void luaK_nil (FuncState *fs, int from, int n); +LUAI_FUNC void luaK_reserveregs (FuncState *fs, int n); +LUAI_FUNC void luaK_checkstack (FuncState *fs, int n); +LUAI_FUNC int luaK_stringK (FuncState *fs, TString *s); +LUAI_FUNC int luaK_numberK (FuncState *fs, lua_Number r); +LUAI_FUNC void luaK_dischargevars (FuncState *fs, expdesc *e); +LUAI_FUNC int luaK_exp2anyreg (FuncState *fs, expdesc *e); +LUAI_FUNC void luaK_exp2anyregup (FuncState *fs, expdesc *e); +LUAI_FUNC void luaK_exp2nextreg (FuncState *fs, expdesc *e); +LUAI_FUNC void luaK_exp2val (FuncState *fs, expdesc *e); +LUAI_FUNC int luaK_exp2RK (FuncState *fs, expdesc *e); +LUAI_FUNC void luaK_self (FuncState *fs, expdesc *e, expdesc *key); +LUAI_FUNC void luaK_indexed (FuncState *fs, expdesc *t, expdesc *k); +LUAI_FUNC void luaK_goiftrue (FuncState *fs, expdesc *e); +LUAI_FUNC void luaK_goiffalse (FuncState *fs, expdesc *e); +LUAI_FUNC void luaK_storevar (FuncState *fs, expdesc *var, expdesc *e); +LUAI_FUNC void luaK_setreturns (FuncState *fs, expdesc *e, int nresults); +LUAI_FUNC void luaK_setoneret (FuncState *fs, expdesc *e); +LUAI_FUNC int luaK_jump (FuncState *fs); +LUAI_FUNC void luaK_ret (FuncState *fs, int first, int nret); +LUAI_FUNC void luaK_patchlist (FuncState *fs, int list, int target); +LUAI_FUNC void luaK_patchtohere (FuncState *fs, int list); +LUAI_FUNC void luaK_patchclose (FuncState *fs, int list, int level); +LUAI_FUNC void luaK_concat (FuncState *fs, int *l1, int l2); +LUAI_FUNC int luaK_getlabel (FuncState *fs); +LUAI_FUNC void luaK_prefix (FuncState *fs, UnOpr op, expdesc *v, int line); +LUAI_FUNC void luaK_infix (FuncState *fs, BinOpr op, expdesc *v); +LUAI_FUNC void luaK_posfix (FuncState *fs, BinOpr op, expdesc *v1, + expdesc *v2, int line); +LUAI_FUNC void luaK_setlist (FuncState *fs, int base, int nelems, int tostore); + + +#endif diff --git a/ext/lua/includes/lctype.h b/ext/lua/includes/lctype.h new file mode 100644 index 000000000..99c7d1223 --- /dev/null +++ b/ext/lua/includes/lctype.h @@ -0,0 +1,95 @@ +/* +** $Id: lctype.h,v 1.12 2011/07/15 12:50:29 roberto Exp $ +** 'ctype' functions for Lua +** See Copyright Notice in lua.h +*/ + +#ifndef lctype_h +#define lctype_h + +#include "lua.h" + + +/* +** WARNING: the functions defined here do not necessarily correspond +** to the similar functions in the standard C ctype.h. They are +** optimized for the specific needs of Lua +*/ + +#if !defined(LUA_USE_CTYPE) + +#if 'A' == 65 && '0' == 48 +/* ASCII case: can use its own tables; faster and fixed */ +#define LUA_USE_CTYPE 0 +#else +/* must use standard C ctype */ +#define LUA_USE_CTYPE 1 +#endif + +#endif + + +#if !LUA_USE_CTYPE /* { */ + +#include + +#include "llimits.h" + + +#define ALPHABIT 0 +#define DIGITBIT 1 +#define PRINTBIT 2 +#define SPACEBIT 3 +#define XDIGITBIT 4 + + +#define MASK(B) (1 << (B)) + + +/* +** add 1 to char to allow index -1 (EOZ) +*/ +#define testprop(c,p) (luai_ctype_[(c)+1] & (p)) + +/* +** 'lalpha' (Lua alphabetic) and 'lalnum' (Lua alphanumeric) both include '_' +*/ +#define lislalpha(c) testprop(c, MASK(ALPHABIT)) +#define lislalnum(c) testprop(c, (MASK(ALPHABIT) | MASK(DIGITBIT))) +#define lisdigit(c) testprop(c, MASK(DIGITBIT)) +#define lisspace(c) testprop(c, MASK(SPACEBIT)) +#define lisprint(c) testprop(c, MASK(PRINTBIT)) +#define lisxdigit(c) testprop(c, MASK(XDIGITBIT)) + +/* +** this 'ltolower' only works for alphabetic characters +*/ +#define ltolower(c) ((c) | ('A' ^ 'a')) + + +/* two more entries for 0 and -1 (EOZ) */ +LUAI_DDEC const lu_byte luai_ctype_[UCHAR_MAX + 2]; + + +#else /* }{ */ + +/* +** use standard C ctypes +*/ + +#include + + +#define lislalpha(c) (isalpha(c) || (c) == '_') +#define lislalnum(c) (isalnum(c) || (c) == '_') +#define lisdigit(c) (isdigit(c)) +#define lisspace(c) (isspace(c)) +#define lisprint(c) (isprint(c)) +#define lisxdigit(c) (isxdigit(c)) + +#define ltolower(c) (tolower(c)) + +#endif /* } */ + +#endif + diff --git a/ext/lua/includes/ldebug.h b/ext/lua/includes/ldebug.h new file mode 100644 index 000000000..fe39556b0 --- /dev/null +++ b/ext/lua/includes/ldebug.h @@ -0,0 +1,34 @@ +/* +** $Id: ldebug.h,v 2.7 2011/10/07 20:45:19 roberto Exp $ +** Auxiliary functions from Debug Interface module +** See Copyright Notice in lua.h +*/ + +#ifndef ldebug_h +#define ldebug_h + + +#include "lstate.h" + + +#define pcRel(pc, p) (cast(int, (pc) - (p)->code) - 1) + +#define getfuncline(f,pc) (((f)->lineinfo) ? (f)->lineinfo[pc] : 0) + +#define resethookcount(L) (L->hookcount = L->basehookcount) + +/* Active Lua function (given call info) */ +#define ci_func(ci) (clLvalue((ci)->func)) + + +LUAI_FUNC l_noret luaG_typeerror (lua_State *L, const TValue *o, + const char *opname); +LUAI_FUNC l_noret luaG_concaterror (lua_State *L, StkId p1, StkId p2); +LUAI_FUNC l_noret luaG_aritherror (lua_State *L, const TValue *p1, + const TValue *p2); +LUAI_FUNC l_noret luaG_ordererror (lua_State *L, const TValue *p1, + const TValue *p2); +LUAI_FUNC l_noret luaG_runerror (lua_State *L, const char *fmt, ...); +LUAI_FUNC l_noret luaG_errormsg (lua_State *L); + +#endif diff --git a/ext/lua/includes/ldo.h b/ext/lua/includes/ldo.h new file mode 100644 index 000000000..27b837d99 --- /dev/null +++ b/ext/lua/includes/ldo.h @@ -0,0 +1,46 @@ +/* +** $Id: ldo.h,v 2.20 2011/11/29 15:55:08 roberto Exp $ +** Stack and Call structure of Lua +** See Copyright Notice in lua.h +*/ + +#ifndef ldo_h +#define ldo_h + + +#include "lobject.h" +#include "lstate.h" +#include "lzio.h" + + +#define luaD_checkstack(L,n) if (L->stack_last - L->top <= (n)) \ + luaD_growstack(L, n); else condmovestack(L); + + +#define incr_top(L) {L->top++; luaD_checkstack(L,0);} + +#define savestack(L,p) ((char *)(p) - (char *)L->stack) +#define restorestack(L,n) ((TValue *)((char *)L->stack + (n))) + + +/* type of protected functions, to be ran by `runprotected' */ +typedef void (*Pfunc) (lua_State *L, void *ud); + +LUAI_FUNC int luaD_protectedparser (lua_State *L, ZIO *z, const char *name, + const char *mode); +LUAI_FUNC void luaD_hook (lua_State *L, int event, int line); +LUAI_FUNC int luaD_precall (lua_State *L, StkId func, int nresults); +LUAI_FUNC void luaD_call (lua_State *L, StkId func, int nResults, + int allowyield); +LUAI_FUNC int luaD_pcall (lua_State *L, Pfunc func, void *u, + ptrdiff_t oldtop, ptrdiff_t ef); +LUAI_FUNC int luaD_poscall (lua_State *L, StkId firstResult); +LUAI_FUNC void luaD_reallocstack (lua_State *L, int newsize); +LUAI_FUNC void luaD_growstack (lua_State *L, int n); +LUAI_FUNC void luaD_shrinkstack (lua_State *L); + +LUAI_FUNC l_noret luaD_throw (lua_State *L, int errcode); +LUAI_FUNC int luaD_rawrunprotected (lua_State *L, Pfunc f, void *ud); + +#endif + diff --git a/ext/lua/includes/lfunc.h b/ext/lua/includes/lfunc.h new file mode 100644 index 000000000..e236a717c --- /dev/null +++ b/ext/lua/includes/lfunc.h @@ -0,0 +1,33 @@ +/* +** $Id: lfunc.h,v 2.8 2012/05/08 13:53:33 roberto Exp $ +** Auxiliary functions to manipulate prototypes and closures +** See Copyright Notice in lua.h +*/ + +#ifndef lfunc_h +#define lfunc_h + + +#include "lobject.h" + + +#define sizeCclosure(n) (cast(int, sizeof(CClosure)) + \ + cast(int, sizeof(TValue)*((n)-1))) + +#define sizeLclosure(n) (cast(int, sizeof(LClosure)) + \ + cast(int, sizeof(TValue *)*((n)-1))) + + +LUAI_FUNC Proto *luaF_newproto (lua_State *L); +LUAI_FUNC Closure *luaF_newCclosure (lua_State *L, int nelems); +LUAI_FUNC Closure *luaF_newLclosure (lua_State *L, int nelems); +LUAI_FUNC UpVal *luaF_newupval (lua_State *L); +LUAI_FUNC UpVal *luaF_findupval (lua_State *L, StkId level); +LUAI_FUNC void luaF_close (lua_State *L, StkId level); +LUAI_FUNC void luaF_freeproto (lua_State *L, Proto *f); +LUAI_FUNC void luaF_freeupval (lua_State *L, UpVal *uv); +LUAI_FUNC const char *luaF_getlocalname (const Proto *func, int local_number, + int pc); + + +#endif diff --git a/ext/lua/includes/lgc.h b/ext/lua/includes/lgc.h new file mode 100644 index 000000000..dee270b4d --- /dev/null +++ b/ext/lua/includes/lgc.h @@ -0,0 +1,157 @@ +/* +** $Id: lgc.h,v 2.58 2012/09/11 12:53:08 roberto Exp $ +** Garbage Collector +** See Copyright Notice in lua.h +*/ + +#ifndef lgc_h +#define lgc_h + + +#include "lobject.h" +#include "lstate.h" + +/* +** Collectable objects may have one of three colors: white, which +** means the object is not marked; gray, which means the +** object is marked, but its references may be not marked; and +** black, which means that the object and all its references are marked. +** The main invariant of the garbage collector, while marking objects, +** is that a black object can never point to a white one. Moreover, +** any gray object must be in a "gray list" (gray, grayagain, weak, +** allweak, ephemeron) so that it can be visited again before finishing +** the collection cycle. These lists have no meaning when the invariant +** is not being enforced (e.g., sweep phase). +*/ + + + +/* how much to allocate before next GC step */ +#if !defined(GCSTEPSIZE) +/* ~100 small strings */ +#define GCSTEPSIZE (cast_int(100 * sizeof(TString))) +#endif + + +/* +** Possible states of the Garbage Collector +*/ +#define GCSpropagate 0 +#define GCSatomic 1 +#define GCSsweepstring 2 +#define GCSsweepudata 3 +#define GCSsweep 4 +#define GCSpause 5 + + +#define issweepphase(g) \ + (GCSsweepstring <= (g)->gcstate && (g)->gcstate <= GCSsweep) + +#define isgenerational(g) ((g)->gckind == KGC_GEN) + +/* +** macros to tell when main invariant (white objects cannot point to black +** ones) must be kept. During a non-generational collection, the sweep +** phase may break the invariant, as objects turned white may point to +** still-black objects. The invariant is restored when sweep ends and +** all objects are white again. During a generational collection, the +** invariant must be kept all times. +*/ + +#define keepinvariant(g) (isgenerational(g) || g->gcstate <= GCSatomic) + + +/* +** Outside the collector, the state in generational mode is kept in +** 'propagate', so 'keepinvariant' is always true. +*/ +#define keepinvariantout(g) \ + check_exp(g->gcstate == GCSpropagate || !isgenerational(g), \ + g->gcstate <= GCSatomic) + + +/* +** some useful bit tricks +*/ +#define resetbits(x,m) ((x) &= cast(lu_byte, ~(m))) +#define setbits(x,m) ((x) |= (m)) +#define testbits(x,m) ((x) & (m)) +#define bitmask(b) (1<<(b)) +#define bit2mask(b1,b2) (bitmask(b1) | bitmask(b2)) +#define l_setbit(x,b) setbits(x, bitmask(b)) +#define resetbit(x,b) resetbits(x, bitmask(b)) +#define testbit(x,b) testbits(x, bitmask(b)) + + +/* Layout for bit use in `marked' field: */ +#define WHITE0BIT 0 /* object is white (type 0) */ +#define WHITE1BIT 1 /* object is white (type 1) */ +#define BLACKBIT 2 /* object is black */ +#define FINALIZEDBIT 3 /* object has been separated for finalization */ +#define SEPARATED 4 /* object is in 'finobj' list or in 'tobefnz' */ +#define FIXEDBIT 5 /* object is fixed (should not be collected) */ +#define OLDBIT 6 /* object is old (only in generational mode) */ +/* bit 7 is currently used by tests (luaL_checkmemory) */ + +#define WHITEBITS bit2mask(WHITE0BIT, WHITE1BIT) + + +#define iswhite(x) testbits((x)->gch.marked, WHITEBITS) +#define isblack(x) testbit((x)->gch.marked, BLACKBIT) +#define isgray(x) /* neither white nor black */ \ + (!testbits((x)->gch.marked, WHITEBITS | bitmask(BLACKBIT))) + +#define isold(x) testbit((x)->gch.marked, OLDBIT) + +/* MOVE OLD rule: whenever an object is moved to the beginning of + a GC list, its old bit must be cleared */ +#define resetoldbit(o) resetbit((o)->gch.marked, OLDBIT) + +#define otherwhite(g) (g->currentwhite ^ WHITEBITS) +#define isdeadm(ow,m) (!(((m) ^ WHITEBITS) & (ow))) +#define isdead(g,v) isdeadm(otherwhite(g), (v)->gch.marked) + +#define changewhite(x) ((x)->gch.marked ^= WHITEBITS) +#define gray2black(x) l_setbit((x)->gch.marked, BLACKBIT) + +#define valiswhite(x) (iscollectable(x) && iswhite(gcvalue(x))) + +#define luaC_white(g) cast(lu_byte, (g)->currentwhite & WHITEBITS) + + +#define luaC_condGC(L,c) \ + {if (G(L)->GCdebt > 0) {c;}; condchangemem(L);} +#define luaC_checkGC(L) luaC_condGC(L, luaC_step(L);) + + +#define luaC_barrier(L,p,v) { if (valiswhite(v) && isblack(obj2gco(p))) \ + luaC_barrier_(L,obj2gco(p),gcvalue(v)); } + +#define luaC_barrierback(L,p,v) { if (valiswhite(v) && isblack(obj2gco(p))) \ + luaC_barrierback_(L,p); } + +#define luaC_objbarrier(L,p,o) \ + { if (iswhite(obj2gco(o)) && isblack(obj2gco(p))) \ + luaC_barrier_(L,obj2gco(p),obj2gco(o)); } + +#define luaC_objbarrierback(L,p,o) \ + { if (iswhite(obj2gco(o)) && isblack(obj2gco(p))) luaC_barrierback_(L,p); } + +#define luaC_barrierproto(L,p,c) \ + { if (isblack(obj2gco(p))) luaC_barrierproto_(L,p,c); } + +LUAI_FUNC void luaC_freeallobjects (lua_State *L); +LUAI_FUNC void luaC_step (lua_State *L); +LUAI_FUNC void luaC_forcestep (lua_State *L); +LUAI_FUNC void luaC_runtilstate (lua_State *L, int statesmask); +LUAI_FUNC void luaC_fullgc (lua_State *L, int isemergency); +LUAI_FUNC GCObject *luaC_newobj (lua_State *L, int tt, size_t sz, + GCObject **list, int offset); +LUAI_FUNC void luaC_barrier_ (lua_State *L, GCObject *o, GCObject *v); +LUAI_FUNC void luaC_barrierback_ (lua_State *L, GCObject *o); +LUAI_FUNC void luaC_barrierproto_ (lua_State *L, Proto *p, Closure *c); +LUAI_FUNC void luaC_checkfinalizer (lua_State *L, GCObject *o, Table *mt); +LUAI_FUNC void luaC_checkupvalcolor (global_State *g, UpVal *uv); +LUAI_FUNC void luaC_changemode (lua_State *L, int mode); + +#endif diff --git a/ext/lua/includes/llex.h b/ext/lua/includes/llex.h new file mode 100644 index 000000000..9ca8a2994 --- /dev/null +++ b/ext/lua/includes/llex.h @@ -0,0 +1,78 @@ +/* +** $Id: llex.h,v 1.72 2011/11/30 12:43:51 roberto Exp $ +** Lexical Analyzer +** See Copyright Notice in lua.h +*/ + +#ifndef llex_h +#define llex_h + +#include "lobject.h" +#include "lzio.h" + + +#define FIRST_RESERVED 257 + + + +/* +* WARNING: if you change the order of this enumeration, +* grep "ORDER RESERVED" +*/ +enum RESERVED { + /* terminal symbols denoted by reserved words */ + TK_AND = FIRST_RESERVED, TK_BREAK, + TK_DO, TK_ELSE, TK_ELSEIF, TK_END, TK_FALSE, TK_FOR, TK_FUNCTION, + TK_GOTO, TK_IF, TK_IN, TK_LOCAL, TK_NIL, TK_NOT, TK_OR, TK_REPEAT, + TK_RETURN, TK_THEN, TK_TRUE, TK_UNTIL, TK_WHILE, + /* other terminal symbols */ + TK_CONCAT, TK_DOTS, TK_EQ, TK_GE, TK_LE, TK_NE, TK_DBCOLON, TK_EOS, + TK_NUMBER, TK_NAME, TK_STRING +}; + +/* number of reserved words */ +#define NUM_RESERVED (cast(int, TK_WHILE-FIRST_RESERVED+1)) + + +typedef union { + lua_Number r; + TString *ts; +} SemInfo; /* semantics information */ + + +typedef struct Token { + int token; + SemInfo seminfo; +} Token; + + +/* state of the lexer plus state of the parser when shared by all + functions */ +typedef struct LexState { + int current; /* current character (charint) */ + int linenumber; /* input line counter */ + int lastline; /* line of last token `consumed' */ + Token t; /* current token */ + Token lookahead; /* look ahead token */ + struct FuncState *fs; /* current function (parser) */ + struct lua_State *L; + ZIO *z; /* input stream */ + Mbuffer *buff; /* buffer for tokens */ + struct Dyndata *dyd; /* dynamic structures used by the parser */ + TString *source; /* current source name */ + TString *envn; /* environment variable name */ + char decpoint; /* locale decimal point */ +} LexState; + + +LUAI_FUNC void luaX_init (lua_State *L); +LUAI_FUNC void luaX_setinput (lua_State *L, LexState *ls, ZIO *z, + TString *source, int firstchar); +LUAI_FUNC TString *luaX_newstring (LexState *ls, const char *str, size_t l); +LUAI_FUNC void luaX_next (LexState *ls); +LUAI_FUNC int luaX_lookahead (LexState *ls); +LUAI_FUNC l_noret luaX_syntaxerror (LexState *ls, const char *s); +LUAI_FUNC const char *luaX_token2str (LexState *ls, int token); + + +#endif diff --git a/ext/lua/includes/llimits.h b/ext/lua/includes/llimits.h new file mode 100644 index 000000000..1b8c79bda --- /dev/null +++ b/ext/lua/includes/llimits.h @@ -0,0 +1,309 @@ +/* +** $Id: llimits.h,v 1.103 2013/02/20 14:08:56 roberto Exp $ +** Limits, basic types, and some other `installation-dependent' definitions +** See Copyright Notice in lua.h +*/ + +#ifndef llimits_h +#define llimits_h + + +#include +#include + + +#include "lua.h" + + +typedef unsigned LUA_INT32 lu_int32; + +typedef LUAI_UMEM lu_mem; + +typedef LUAI_MEM l_mem; + + + +/* chars used as small naturals (so that `char' is reserved for characters) */ +typedef unsigned char lu_byte; + + +#define MAX_SIZET ((size_t)(~(size_t)0)-2) + +#define MAX_LUMEM ((lu_mem)(~(lu_mem)0)-2) + +#define MAX_LMEM ((l_mem) ((MAX_LUMEM >> 1) - 2)) + + +#define MAX_INT (INT_MAX-2) /* maximum value of an int (-2 for safety) */ + +/* +** conversion of pointer to integer +** this is for hashing only; there is no problem if the integer +** cannot hold the whole pointer value +*/ +#define IntPoint(p) ((unsigned int)(lu_mem)(p)) + + + +/* type to ensure maximum alignment */ +#if !defined(LUAI_USER_ALIGNMENT_T) +#define LUAI_USER_ALIGNMENT_T union { double u; void *s; long l; } +#endif + +typedef LUAI_USER_ALIGNMENT_T L_Umaxalign; + + +/* result of a `usual argument conversion' over lua_Number */ +typedef LUAI_UACNUMBER l_uacNumber; + + +/* internal assertions for in-house debugging */ +#if defined(lua_assert) +#define check_exp(c,e) (lua_assert(c), (e)) +/* to avoid problems with conditions too long */ +#define lua_longassert(c) { if (!(c)) lua_assert(0); } +#else +#define lua_assert(c) ((void)0) +#define check_exp(c,e) (e) +#define lua_longassert(c) ((void)0) +#endif + +/* +** assertion for checking API calls +*/ +#if !defined(luai_apicheck) + +#if defined(LUA_USE_APICHECK) +#include +#define luai_apicheck(L,e) assert(e) +#else +#define luai_apicheck(L,e) lua_assert(e) +#endif + +#endif + +#define api_check(l,e,msg) luai_apicheck(l,(e) && msg) + + +#if !defined(UNUSED) +#define UNUSED(x) ((void)(x)) /* to avoid warnings */ +#endif + + +#define cast(t, exp) ((t)(exp)) + +#define cast_byte(i) cast(lu_byte, (i)) +#define cast_num(i) cast(lua_Number, (i)) +#define cast_int(i) cast(int, (i)) +#define cast_uchar(i) cast(unsigned char, (i)) + + +/* +** non-return type +*/ +#if defined(__GNUC__) +#define l_noret void __attribute__((noreturn)) +#elif defined(_MSC_VER) +#define l_noret void __declspec(noreturn) +#else +#define l_noret void +#endif + + + +/* +** maximum depth for nested C calls and syntactical nested non-terminals +** in a program. (Value must fit in an unsigned short int.) +*/ +#if !defined(LUAI_MAXCCALLS) +#define LUAI_MAXCCALLS 200 +#endif + +/* +** maximum number of upvalues in a closure (both C and Lua). (Value +** must fit in an unsigned char.) +*/ +#define MAXUPVAL UCHAR_MAX + + +/* +** type for virtual-machine instructions +** must be an unsigned with (at least) 4 bytes (see details in lopcodes.h) +*/ +typedef lu_int32 Instruction; + + + +/* maximum stack for a Lua function */ +#define MAXSTACK 250 + + + +/* minimum size for the string table (must be power of 2) */ +#if !defined(MINSTRTABSIZE) +#define MINSTRTABSIZE 32 +#endif + + +/* minimum size for string buffer */ +#if !defined(LUA_MINBUFFER) +#define LUA_MINBUFFER 32 +#endif + + +#if !defined(lua_lock) +#define lua_lock(L) ((void) 0) +#define lua_unlock(L) ((void) 0) +#endif + +#if !defined(luai_threadyield) +#define luai_threadyield(L) {lua_unlock(L); lua_lock(L);} +#endif + + +/* +** these macros allow user-specific actions on threads when you defined +** LUAI_EXTRASPACE and need to do something extra when a thread is +** created/deleted/resumed/yielded. +*/ +#if !defined(luai_userstateopen) +#define luai_userstateopen(L) ((void)L) +#endif + +#if !defined(luai_userstateclose) +#define luai_userstateclose(L) ((void)L) +#endif + +#if !defined(luai_userstatethread) +#define luai_userstatethread(L,L1) ((void)L) +#endif + +#if !defined(luai_userstatefree) +#define luai_userstatefree(L,L1) ((void)L) +#endif + +#if !defined(luai_userstateresume) +#define luai_userstateresume(L,n) ((void)L) +#endif + +#if !defined(luai_userstateyield) +#define luai_userstateyield(L,n) ((void)L) +#endif + +/* +** lua_number2int is a macro to convert lua_Number to int. +** lua_number2integer is a macro to convert lua_Number to lua_Integer. +** lua_number2unsigned is a macro to convert a lua_Number to a lua_Unsigned. +** lua_unsigned2number is a macro to convert a lua_Unsigned to a lua_Number. +** luai_hashnum is a macro to hash a lua_Number value into an integer. +** The hash must be deterministic and give reasonable values for +** both small and large values (outside the range of integers). +*/ + +#if defined(MS_ASMTRICK) || defined(LUA_MSASMTRICK) /* { */ +/* trick with Microsoft assembler for X86 */ + +#define lua_number2int(i,n) __asm {__asm fld n __asm fistp i} +#define lua_number2integer(i,n) lua_number2int(i, n) +#define lua_number2unsigned(i,n) \ + {__int64 l; __asm {__asm fld n __asm fistp l} i = (unsigned int)l;} + + +#elif defined(LUA_IEEE754TRICK) /* }{ */ +/* the next trick should work on any machine using IEEE754 with + a 32-bit int type */ + +union luai_Cast { double l_d; LUA_INT32 l_p[2]; }; + +#if !defined(LUA_IEEEENDIAN) /* { */ +#define LUAI_EXTRAIEEE \ + static const union luai_Cast ieeeendian = {-(33.0 + 6755399441055744.0)}; +#define LUA_IEEEENDIANLOC (ieeeendian.l_p[1] == 33) +#else +#define LUA_IEEEENDIANLOC LUA_IEEEENDIAN +#define LUAI_EXTRAIEEE /* empty */ +#endif /* } */ + +#define lua_number2int32(i,n,t) \ + { LUAI_EXTRAIEEE \ + volatile union luai_Cast u; u.l_d = (n) + 6755399441055744.0; \ + (i) = (t)u.l_p[LUA_IEEEENDIANLOC]; } + +#define luai_hashnum(i,n) \ + { volatile union luai_Cast u; u.l_d = (n) + 1.0; /* avoid -0 */ \ + (i) = u.l_p[0]; (i) += u.l_p[1]; } /* add double bits for his hash */ + +#define lua_number2int(i,n) lua_number2int32(i, n, int) +#define lua_number2unsigned(i,n) lua_number2int32(i, n, lua_Unsigned) + +/* the trick can be expanded to lua_Integer when it is a 32-bit value */ +#if defined(LUA_IEEELL) +#define lua_number2integer(i,n) lua_number2int32(i, n, lua_Integer) +#endif + +#endif /* } */ + + +/* the following definitions always work, but may be slow */ + +#if !defined(lua_number2int) +#define lua_number2int(i,n) ((i)=(int)(n)) +#endif + +#if !defined(lua_number2integer) +#define lua_number2integer(i,n) ((i)=(lua_Integer)(n)) +#endif + +#if !defined(lua_number2unsigned) /* { */ +/* the following definition assures proper modulo behavior */ +#if defined(LUA_NUMBER_DOUBLE) || defined(LUA_NUMBER_FLOAT) +#include +#define SUPUNSIGNED ((lua_Number)(~(lua_Unsigned)0) + 1) +#define lua_number2unsigned(i,n) \ + ((i)=(lua_Unsigned)((n) - floor((n)/SUPUNSIGNED)*SUPUNSIGNED)) +#else +#define lua_number2unsigned(i,n) ((i)=(lua_Unsigned)(n)) +#endif +#endif /* } */ + + +#if !defined(lua_unsigned2number) +/* on several machines, coercion from unsigned to double is slow, + so it may be worth to avoid */ +#define lua_unsigned2number(u) \ + (((u) <= (lua_Unsigned)INT_MAX) ? (lua_Number)(int)(u) : (lua_Number)(u)) +#endif + + + +#if defined(ltable_c) && !defined(luai_hashnum) + +#include +#include + +#define luai_hashnum(i,n) { int e; \ + n = l_mathop(frexp)(n, &e) * (lua_Number)(INT_MAX - DBL_MAX_EXP); \ + lua_number2int(i, n); i += e; } + +#endif + + + +/* +** macro to control inclusion of some hard tests on stack reallocation +*/ +#if !defined(HARDSTACKTESTS) +#define condmovestack(L) ((void)0) +#else +/* realloc stack keeping its size */ +#define condmovestack(L) luaD_reallocstack((L), (L)->stacksize) +#endif + +#if !defined(HARDMEMTESTS) +#define condchangemem(L) condmovestack(L) +#else +#define condchangemem(L) \ + ((void)(!(G(L)->gcrunning) || (luaC_fullgc(L, 0), 1))) +#endif + +#endif diff --git a/ext/lua/includes/lmem.h b/ext/lua/includes/lmem.h new file mode 100644 index 000000000..5f850999a --- /dev/null +++ b/ext/lua/includes/lmem.h @@ -0,0 +1,57 @@ +/* +** $Id: lmem.h,v 1.40 2013/02/20 14:08:21 roberto Exp $ +** Interface to Memory Manager +** See Copyright Notice in lua.h +*/ + +#ifndef lmem_h +#define lmem_h + + +#include + +#include "llimits.h" +#include "lua.h" + + +/* +** This macro avoids the runtime division MAX_SIZET/(e), as 'e' is +** always constant. +** The macro is somewhat complex to avoid warnings: +** +1 avoids warnings of "comparison has constant result"; +** cast to 'void' avoids warnings of "value unused". +*/ +#define luaM_reallocv(L,b,on,n,e) \ + (cast(void, \ + (cast(size_t, (n)+1) > MAX_SIZET/(e)) ? (luaM_toobig(L), 0) : 0), \ + luaM_realloc_(L, (b), (on)*(e), (n)*(e))) + +#define luaM_freemem(L, b, s) luaM_realloc_(L, (b), (s), 0) +#define luaM_free(L, b) luaM_realloc_(L, (b), sizeof(*(b)), 0) +#define luaM_freearray(L, b, n) luaM_reallocv(L, (b), n, 0, sizeof((b)[0])) + +#define luaM_malloc(L,s) luaM_realloc_(L, NULL, 0, (s)) +#define luaM_new(L,t) cast(t *, luaM_malloc(L, sizeof(t))) +#define luaM_newvector(L,n,t) \ + cast(t *, luaM_reallocv(L, NULL, 0, n, sizeof(t))) + +#define luaM_newobject(L,tag,s) luaM_realloc_(L, NULL, tag, (s)) + +#define luaM_growvector(L,v,nelems,size,t,limit,e) \ + if ((nelems)+1 > (size)) \ + ((v)=cast(t *, luaM_growaux_(L,v,&(size),sizeof(t),limit,e))) + +#define luaM_reallocvector(L, v,oldn,n,t) \ + ((v)=cast(t *, luaM_reallocv(L, v, oldn, n, sizeof(t)))) + +LUAI_FUNC l_noret luaM_toobig (lua_State *L); + +/* not to be called directly */ +LUAI_FUNC void *luaM_realloc_ (lua_State *L, void *block, size_t oldsize, + size_t size); +LUAI_FUNC void *luaM_growaux_ (lua_State *L, void *block, int *size, + size_t size_elem, int limit, + const char *what); + +#endif + diff --git a/ext/lua/includes/lobject.h b/ext/lua/includes/lobject.h new file mode 100644 index 000000000..dd23b9143 --- /dev/null +++ b/ext/lua/includes/lobject.h @@ -0,0 +1,607 @@ +/* +** $Id: lobject.h,v 2.71 2012/09/11 18:21:44 roberto Exp $ +** Type definitions for Lua objects +** See Copyright Notice in lua.h +*/ + + +#ifndef lobject_h +#define lobject_h + + +#include + + +#include "llimits.h" +#include "lua.h" + + +/* +** Extra tags for non-values +*/ +#define LUA_TPROTO LUA_NUMTAGS +#define LUA_TUPVAL (LUA_NUMTAGS+1) +#define LUA_TDEADKEY (LUA_NUMTAGS+2) + +/* +** number of all possible tags (including LUA_TNONE but excluding DEADKEY) +*/ +#define LUA_TOTALTAGS (LUA_TUPVAL+2) + + +/* +** tags for Tagged Values have the following use of bits: +** bits 0-3: actual tag (a LUA_T* value) +** bits 4-5: variant bits +** bit 6: whether value is collectable +*/ + +#define VARBITS (3 << 4) + + +/* +** LUA_TFUNCTION variants: +** 0 - Lua function +** 1 - light C function +** 2 - regular C function (closure) +*/ + +/* Variant tags for functions */ +#define LUA_TLCL (LUA_TFUNCTION | (0 << 4)) /* Lua closure */ +#define LUA_TLCF (LUA_TFUNCTION | (1 << 4)) /* light C function */ +#define LUA_TCCL (LUA_TFUNCTION | (2 << 4)) /* C closure */ + + +/* Variant tags for strings */ +#define LUA_TSHRSTR (LUA_TSTRING | (0 << 4)) /* short strings */ +#define LUA_TLNGSTR (LUA_TSTRING | (1 << 4)) /* long strings */ + + +/* Bit mark for collectable types */ +#define BIT_ISCOLLECTABLE (1 << 6) + +/* mark a tag as collectable */ +#define ctb(t) ((t) | BIT_ISCOLLECTABLE) + + +/* +** Union of all collectable objects +*/ +typedef union GCObject GCObject; + + +/* +** Common Header for all collectable objects (in macro form, to be +** included in other objects) +*/ +#define CommonHeader GCObject *next; lu_byte tt; lu_byte marked + + +/* +** Common header in struct form +*/ +typedef struct GCheader { + CommonHeader; +} GCheader; + + + +/* +** Union of all Lua values +*/ +typedef union Value Value; + + +#define numfield lua_Number n; /* numbers */ + + + +/* +** Tagged Values. This is the basic representation of values in Lua, +** an actual value plus a tag with its type. +*/ + +#define TValuefields Value value_; int tt_ + +typedef struct lua_TValue TValue; + + +/* macro defining a nil value */ +#define NILCONSTANT {NULL}, LUA_TNIL + + +#define val_(o) ((o)->value_) +#define num_(o) (val_(o).n) + + +/* raw type tag of a TValue */ +#define rttype(o) ((o)->tt_) + +/* tag with no variants (bits 0-3) */ +#define novariant(x) ((x) & 0x0F) + +/* type tag of a TValue (bits 0-3 for tags + variant bits 4-5) */ +#define ttype(o) (rttype(o) & 0x3F) + +/* type tag of a TValue with no variants (bits 0-3) */ +#define ttypenv(o) (novariant(rttype(o))) + + +/* Macros to test type */ +#define checktag(o,t) (rttype(o) == (t)) +#define checktype(o,t) (ttypenv(o) == (t)) +#define ttisnumber(o) checktag((o), LUA_TNUMBER) +#define ttisnil(o) checktag((o), LUA_TNIL) +#define ttisboolean(o) checktag((o), LUA_TBOOLEAN) +#define ttislightuserdata(o) checktag((o), LUA_TLIGHTUSERDATA) +#define ttisstring(o) checktype((o), LUA_TSTRING) +#define ttisshrstring(o) checktag((o), ctb(LUA_TSHRSTR)) +#define ttislngstring(o) checktag((o), ctb(LUA_TLNGSTR)) +#define ttistable(o) checktag((o), ctb(LUA_TTABLE)) +#define ttisfunction(o) checktype(o, LUA_TFUNCTION) +#define ttisclosure(o) ((rttype(o) & 0x1F) == LUA_TFUNCTION) +#define ttisCclosure(o) checktag((o), ctb(LUA_TCCL)) +#define ttisLclosure(o) checktag((o), ctb(LUA_TLCL)) +#define ttislcf(o) checktag((o), LUA_TLCF) +#define ttisuserdata(o) checktag((o), ctb(LUA_TUSERDATA)) +#define ttisthread(o) checktag((o), ctb(LUA_TTHREAD)) +#define ttisdeadkey(o) checktag((o), LUA_TDEADKEY) + +#define ttisequal(o1,o2) (rttype(o1) == rttype(o2)) + +/* Macros to access values */ +#define nvalue(o) check_exp(ttisnumber(o), num_(o)) +#define gcvalue(o) check_exp(iscollectable(o), val_(o).gc) +#define pvalue(o) check_exp(ttislightuserdata(o), val_(o).p) +#define rawtsvalue(o) check_exp(ttisstring(o), &val_(o).gc->ts) +#define tsvalue(o) (&rawtsvalue(o)->tsv) +#define rawuvalue(o) check_exp(ttisuserdata(o), &val_(o).gc->u) +#define uvalue(o) (&rawuvalue(o)->uv) +#define clvalue(o) check_exp(ttisclosure(o), &val_(o).gc->cl) +#define clLvalue(o) check_exp(ttisLclosure(o), &val_(o).gc->cl.l) +#define clCvalue(o) check_exp(ttisCclosure(o), &val_(o).gc->cl.c) +#define fvalue(o) check_exp(ttislcf(o), val_(o).f) +#define hvalue(o) check_exp(ttistable(o), &val_(o).gc->h) +#define bvalue(o) check_exp(ttisboolean(o), val_(o).b) +#define thvalue(o) check_exp(ttisthread(o), &val_(o).gc->th) +/* a dead value may get the 'gc' field, but cannot access its contents */ +#define deadvalue(o) check_exp(ttisdeadkey(o), cast(void *, val_(o).gc)) + +#define l_isfalse(o) (ttisnil(o) || (ttisboolean(o) && bvalue(o) == 0)) + + +#define iscollectable(o) (rttype(o) & BIT_ISCOLLECTABLE) + + +/* Macros for internal tests */ +#define righttt(obj) (ttype(obj) == gcvalue(obj)->gch.tt) + +#define checkliveness(g,obj) \ + lua_longassert(!iscollectable(obj) || \ + (righttt(obj) && !isdead(g,gcvalue(obj)))) + + +/* Macros to set values */ +#define settt_(o,t) ((o)->tt_=(t)) + +#define setnvalue(obj,x) \ + { TValue *io=(obj); num_(io)=(x); settt_(io, LUA_TNUMBER); } + +#define setnilvalue(obj) settt_(obj, LUA_TNIL) + +#define setfvalue(obj,x) \ + { TValue *io=(obj); val_(io).f=(x); settt_(io, LUA_TLCF); } + +#define setpvalue(obj,x) \ + { TValue *io=(obj); val_(io).p=(x); settt_(io, LUA_TLIGHTUSERDATA); } + +#define setbvalue(obj,x) \ + { TValue *io=(obj); val_(io).b=(x); settt_(io, LUA_TBOOLEAN); } + +#define setgcovalue(L,obj,x) \ + { TValue *io=(obj); GCObject *i_g=(x); \ + val_(io).gc=i_g; settt_(io, ctb(gch(i_g)->tt)); } + +#define setsvalue(L,obj,x) \ + { TValue *io=(obj); \ + TString *x_ = (x); \ + val_(io).gc=cast(GCObject *, x_); settt_(io, ctb(x_->tsv.tt)); \ + checkliveness(G(L),io); } + +#define setuvalue(L,obj,x) \ + { TValue *io=(obj); \ + val_(io).gc=cast(GCObject *, (x)); settt_(io, ctb(LUA_TUSERDATA)); \ + checkliveness(G(L),io); } + +#define setthvalue(L,obj,x) \ + { TValue *io=(obj); \ + val_(io).gc=cast(GCObject *, (x)); settt_(io, ctb(LUA_TTHREAD)); \ + checkliveness(G(L),io); } + +#define setclLvalue(L,obj,x) \ + { TValue *io=(obj); \ + val_(io).gc=cast(GCObject *, (x)); settt_(io, ctb(LUA_TLCL)); \ + checkliveness(G(L),io); } + +#define setclCvalue(L,obj,x) \ + { TValue *io=(obj); \ + val_(io).gc=cast(GCObject *, (x)); settt_(io, ctb(LUA_TCCL)); \ + checkliveness(G(L),io); } + +#define sethvalue(L,obj,x) \ + { TValue *io=(obj); \ + val_(io).gc=cast(GCObject *, (x)); settt_(io, ctb(LUA_TTABLE)); \ + checkliveness(G(L),io); } + +#define setdeadvalue(obj) settt_(obj, LUA_TDEADKEY) + + + +#define setobj(L,obj1,obj2) \ + { const TValue *io2=(obj2); TValue *io1=(obj1); \ + io1->value_ = io2->value_; io1->tt_ = io2->tt_; \ + checkliveness(G(L),io1); } + + +/* +** different types of assignments, according to destination +*/ + +/* from stack to (same) stack */ +#define setobjs2s setobj +/* to stack (not from same stack) */ +#define setobj2s setobj +#define setsvalue2s setsvalue +#define sethvalue2s sethvalue +#define setptvalue2s setptvalue +/* from table to same table */ +#define setobjt2t setobj +/* to table */ +#define setobj2t setobj +/* to new object */ +#define setobj2n setobj +#define setsvalue2n setsvalue + + +/* check whether a number is valid (useful only for NaN trick) */ +#define luai_checknum(L,o,c) { /* empty */ } + + +/* +** {====================================================== +** NaN Trick +** ======================================================= +*/ +#if defined(LUA_NANTRICK) + +/* +** numbers are represented in the 'd_' field. All other values have the +** value (NNMARK | tag) in 'tt__'. A number with such pattern would be +** a "signaled NaN", which is never generated by regular operations by +** the CPU (nor by 'strtod') +*/ + +/* allows for external implementation for part of the trick */ +#if !defined(NNMARK) /* { */ + + +#if !defined(LUA_IEEEENDIAN) +#error option 'LUA_NANTRICK' needs 'LUA_IEEEENDIAN' +#endif + + +#define NNMARK 0x7FF7A500 +#define NNMASK 0x7FFFFF00 + +#undef TValuefields +#undef NILCONSTANT + +#if (LUA_IEEEENDIAN == 0) /* { */ + +/* little endian */ +#define TValuefields \ + union { struct { Value v__; int tt__; } i; double d__; } u +#define NILCONSTANT {{{NULL}, tag2tt(LUA_TNIL)}} +/* field-access macros */ +#define v_(o) ((o)->u.i.v__) +#define d_(o) ((o)->u.d__) +#define tt_(o) ((o)->u.i.tt__) + +#else /* }{ */ + +/* big endian */ +#define TValuefields \ + union { struct { int tt__; Value v__; } i; double d__; } u +#define NILCONSTANT {{tag2tt(LUA_TNIL), {NULL}}} +/* field-access macros */ +#define v_(o) ((o)->u.i.v__) +#define d_(o) ((o)->u.d__) +#define tt_(o) ((o)->u.i.tt__) + +#endif /* } */ + +#endif /* } */ + + +/* correspondence with standard representation */ +#undef val_ +#define val_(o) v_(o) +#undef num_ +#define num_(o) d_(o) + + +#undef numfield +#define numfield /* no such field; numbers are the entire struct */ + +/* basic check to distinguish numbers from non-numbers */ +#undef ttisnumber +#define ttisnumber(o) ((tt_(o) & NNMASK) != NNMARK) + +#define tag2tt(t) (NNMARK | (t)) + +#undef rttype +#define rttype(o) (ttisnumber(o) ? LUA_TNUMBER : tt_(o) & 0xff) + +#undef settt_ +#define settt_(o,t) (tt_(o) = tag2tt(t)) + +#undef setnvalue +#define setnvalue(obj,x) \ + { TValue *io_=(obj); num_(io_)=(x); lua_assert(ttisnumber(io_)); } + +#undef setobj +#define setobj(L,obj1,obj2) \ + { const TValue *o2_=(obj2); TValue *o1_=(obj1); \ + o1_->u = o2_->u; \ + checkliveness(G(L),o1_); } + + +/* +** these redefinitions are not mandatory, but these forms are more efficient +*/ + +#undef checktag +#undef checktype +#define checktag(o,t) (tt_(o) == tag2tt(t)) +#define checktype(o,t) (ctb(tt_(o) | VARBITS) == ctb(tag2tt(t) | VARBITS)) + +#undef ttisequal +#define ttisequal(o1,o2) \ + (ttisnumber(o1) ? ttisnumber(o2) : (tt_(o1) == tt_(o2))) + + +#undef luai_checknum +#define luai_checknum(L,o,c) { if (!ttisnumber(o)) c; } + +#endif +/* }====================================================== */ + + + +/* +** {====================================================== +** types and prototypes +** ======================================================= +*/ + + +union Value { + GCObject *gc; /* collectable objects */ + void *p; /* light userdata */ + int b; /* booleans */ + lua_CFunction f; /* light C functions */ + numfield /* numbers */ +}; + + +struct lua_TValue { + TValuefields; +}; + + +typedef TValue *StkId; /* index to stack elements */ + + + + +/* +** Header for string value; string bytes follow the end of this structure +*/ +typedef union TString { + L_Umaxalign dummy; /* ensures maximum alignment for strings */ + struct { + CommonHeader; + lu_byte extra; /* reserved words for short strings; "has hash" for longs */ + unsigned int hash; + size_t len; /* number of characters in string */ + } tsv; +} TString; + + +/* get the actual string (array of bytes) from a TString */ +#define getstr(ts) cast(const char *, (ts) + 1) + +/* get the actual string (array of bytes) from a Lua value */ +#define svalue(o) getstr(rawtsvalue(o)) + + +/* +** Header for userdata; memory area follows the end of this structure +*/ +typedef union Udata { + L_Umaxalign dummy; /* ensures maximum alignment for `local' udata */ + struct { + CommonHeader; + struct Table *metatable; + struct Table *env; + size_t len; /* number of bytes */ + } uv; +} Udata; + + + +/* +** Description of an upvalue for function prototypes +*/ +typedef struct Upvaldesc { + TString *name; /* upvalue name (for debug information) */ + lu_byte instack; /* whether it is in stack */ + lu_byte idx; /* index of upvalue (in stack or in outer function's list) */ +} Upvaldesc; + + +/* +** Description of a local variable for function prototypes +** (used for debug information) +*/ +typedef struct LocVar { + TString *varname; + int startpc; /* first point where variable is active */ + int endpc; /* first point where variable is dead */ +} LocVar; + + +/* +** Function Prototypes +*/ +typedef struct Proto { + CommonHeader; + TValue *k; /* constants used by the function */ + Instruction *code; + struct Proto **p; /* functions defined inside the function */ + int *lineinfo; /* map from opcodes to source lines (debug information) */ + LocVar *locvars; /* information about local variables (debug information) */ + Upvaldesc *upvalues; /* upvalue information */ + union Closure *cache; /* last created closure with this prototype */ + TString *source; /* used for debug information */ + int sizeupvalues; /* size of 'upvalues' */ + int sizek; /* size of `k' */ + int sizecode; + int sizelineinfo; + int sizep; /* size of `p' */ + int sizelocvars; + int linedefined; + int lastlinedefined; + GCObject *gclist; + lu_byte numparams; /* number of fixed parameters */ + lu_byte is_vararg; + lu_byte maxstacksize; /* maximum stack used by this function */ +} Proto; + + + +/* +** Lua Upvalues +*/ +typedef struct UpVal { + CommonHeader; + TValue *v; /* points to stack or to its own value */ + union { + TValue value; /* the value (when closed) */ + struct { /* double linked list (when open) */ + struct UpVal *prev; + struct UpVal *next; + } l; + } u; +} UpVal; + + +/* +** Closures +*/ + +#define ClosureHeader \ + CommonHeader; lu_byte nupvalues; GCObject *gclist + +typedef struct CClosure { + ClosureHeader; + lua_CFunction f; + TValue upvalue[1]; /* list of upvalues */ +} CClosure; + + +typedef struct LClosure { + ClosureHeader; + struct Proto *p; + UpVal *upvals[1]; /* list of upvalues */ +} LClosure; + + +typedef union Closure { + CClosure c; + LClosure l; +} Closure; + + +#define isLfunction(o) ttisLclosure(o) + +#define getproto(o) (clLvalue(o)->p) + + +/* +** Tables +*/ + +typedef union TKey { + struct { + TValuefields; + struct Node *next; /* for chaining */ + } nk; + TValue tvk; +} TKey; + + +typedef struct Node { + TValue i_val; + TKey i_key; +} Node; + + +typedef struct Table { + CommonHeader; + lu_byte flags; /* 1<

lsizenode)) + + +/* +** (address of) a fixed nil value +*/ +#define luaO_nilobject (&luaO_nilobject_) + + +LUAI_DDEC const TValue luaO_nilobject_; + + +LUAI_FUNC int luaO_int2fb (unsigned int x); +LUAI_FUNC int luaO_fb2int (int x); +LUAI_FUNC int luaO_ceillog2 (unsigned int x); +LUAI_FUNC lua_Number luaO_arith (int op, lua_Number v1, lua_Number v2); +LUAI_FUNC int luaO_str2d (const char *s, size_t len, lua_Number *result); +LUAI_FUNC int luaO_hexavalue (int c); +LUAI_FUNC const char *luaO_pushvfstring (lua_State *L, const char *fmt, + va_list argp); +LUAI_FUNC const char *luaO_pushfstring (lua_State *L, const char *fmt, ...); +LUAI_FUNC void luaO_chunkid (char *out, const char *source, size_t len); + + +#endif + diff --git a/ext/lua/includes/lopcodes.h b/ext/lua/includes/lopcodes.h new file mode 100644 index 000000000..07d2b3f39 --- /dev/null +++ b/ext/lua/includes/lopcodes.h @@ -0,0 +1,288 @@ +/* +** $Id: lopcodes.h,v 1.142 2011/07/15 12:50:29 roberto Exp $ +** Opcodes for Lua virtual machine +** See Copyright Notice in lua.h +*/ + +#ifndef lopcodes_h +#define lopcodes_h + +#include "llimits.h" + + +/*=========================================================================== + We assume that instructions are unsigned numbers. + All instructions have an opcode in the first 6 bits. + Instructions can have the following fields: + `A' : 8 bits + `B' : 9 bits + `C' : 9 bits + 'Ax' : 26 bits ('A', 'B', and 'C' together) + `Bx' : 18 bits (`B' and `C' together) + `sBx' : signed Bx + + A signed argument is represented in excess K; that is, the number + value is the unsigned value minus K. K is exactly the maximum value + for that argument (so that -max is represented by 0, and +max is + represented by 2*max), which is half the maximum for the corresponding + unsigned argument. +===========================================================================*/ + + +enum OpMode {iABC, iABx, iAsBx, iAx}; /* basic instruction format */ + + +/* +** size and position of opcode arguments. +*/ +#define SIZE_C 9 +#define SIZE_B 9 +#define SIZE_Bx (SIZE_C + SIZE_B) +#define SIZE_A 8 +#define SIZE_Ax (SIZE_C + SIZE_B + SIZE_A) + +#define SIZE_OP 6 + +#define POS_OP 0 +#define POS_A (POS_OP + SIZE_OP) +#define POS_C (POS_A + SIZE_A) +#define POS_B (POS_C + SIZE_C) +#define POS_Bx POS_C +#define POS_Ax POS_A + + +/* +** limits for opcode arguments. +** we use (signed) int to manipulate most arguments, +** so they must fit in LUAI_BITSINT-1 bits (-1 for sign) +*/ +#if SIZE_Bx < LUAI_BITSINT-1 +#define MAXARG_Bx ((1<>1) /* `sBx' is signed */ +#else +#define MAXARG_Bx MAX_INT +#define MAXARG_sBx MAX_INT +#endif + +#if SIZE_Ax < LUAI_BITSINT-1 +#define MAXARG_Ax ((1<>POS_OP) & MASK1(SIZE_OP,0))) +#define SET_OPCODE(i,o) ((i) = (((i)&MASK0(SIZE_OP,POS_OP)) | \ + ((cast(Instruction, o)<>pos) & MASK1(size,0))) +#define setarg(i,v,pos,size) ((i) = (((i)&MASK0(size,pos)) | \ + ((cast(Instruction, v)<= R(A) + 1 */ +OP_EQ,/* A B C if ((RK(B) == RK(C)) ~= A) then pc++ */ +OP_LT,/* A B C if ((RK(B) < RK(C)) ~= A) then pc++ */ +OP_LE,/* A B C if ((RK(B) <= RK(C)) ~= A) then pc++ */ + +OP_TEST,/* A C if not (R(A) <=> C) then pc++ */ +OP_TESTSET,/* A B C if (R(B) <=> C) then R(A) := R(B) else pc++ */ + +OP_CALL,/* A B C R(A), ... ,R(A+C-2) := R(A)(R(A+1), ... ,R(A+B-1)) */ +OP_TAILCALL,/* A B C return R(A)(R(A+1), ... ,R(A+B-1)) */ +OP_RETURN,/* A B return R(A), ... ,R(A+B-2) (see note) */ + +OP_FORLOOP,/* A sBx R(A)+=R(A+2); + if R(A) > 4) & 3)) +#define getCMode(m) (cast(enum OpArgMask, (luaP_opmodes[m] >> 2) & 3)) +#define testAMode(m) (luaP_opmodes[m] & (1 << 6)) +#define testTMode(m) (luaP_opmodes[m] & (1 << 7)) + + +LUAI_DDEC const char *const luaP_opnames[NUM_OPCODES+1]; /* opcode names */ + + +/* number of list items to accumulate before a SETLIST instruction */ +#define LFIELDS_PER_FLUSH 50 + + +#endif diff --git a/ext/lua/includes/lparser.h b/ext/lua/includes/lparser.h new file mode 100644 index 000000000..301167d4f --- /dev/null +++ b/ext/lua/includes/lparser.h @@ -0,0 +1,119 @@ +/* +** $Id: lparser.h,v 1.70 2012/05/08 13:53:33 roberto Exp $ +** Lua Parser +** See Copyright Notice in lua.h +*/ + +#ifndef lparser_h +#define lparser_h + +#include "llimits.h" +#include "lobject.h" +#include "lzio.h" + + +/* +** Expression descriptor +*/ + +typedef enum { + VVOID, /* no value */ + VNIL, + VTRUE, + VFALSE, + VK, /* info = index of constant in `k' */ + VKNUM, /* nval = numerical value */ + VNONRELOC, /* info = result register */ + VLOCAL, /* info = local register */ + VUPVAL, /* info = index of upvalue in 'upvalues' */ + VINDEXED, /* t = table register/upvalue; idx = index R/K */ + VJMP, /* info = instruction pc */ + VRELOCABLE, /* info = instruction pc */ + VCALL, /* info = instruction pc */ + VVARARG /* info = instruction pc */ +} expkind; + + +#define vkisvar(k) (VLOCAL <= (k) && (k) <= VINDEXED) +#define vkisinreg(k) ((k) == VNONRELOC || (k) == VLOCAL) + +typedef struct expdesc { + expkind k; + union { + struct { /* for indexed variables (VINDEXED) */ + short idx; /* index (R/K) */ + lu_byte t; /* table (register or upvalue) */ + lu_byte vt; /* whether 't' is register (VLOCAL) or upvalue (VUPVAL) */ + } ind; + int info; /* for generic use */ + lua_Number nval; /* for VKNUM */ + } u; + int t; /* patch list of `exit when true' */ + int f; /* patch list of `exit when false' */ +} expdesc; + + +/* description of active local variable */ +typedef struct Vardesc { + short idx; /* variable index in stack */ +} Vardesc; + + +/* description of pending goto statements and label statements */ +typedef struct Labeldesc { + TString *name; /* label identifier */ + int pc; /* position in code */ + int line; /* line where it appeared */ + lu_byte nactvar; /* local level where it appears in current block */ +} Labeldesc; + + +/* list of labels or gotos */ +typedef struct Labellist { + Labeldesc *arr; /* array */ + int n; /* number of entries in use */ + int size; /* array size */ +} Labellist; + + +/* dynamic structures used by the parser */ +typedef struct Dyndata { + struct { /* list of active local variables */ + Vardesc *arr; + int n; + int size; + } actvar; + Labellist gt; /* list of pending gotos */ + Labellist label; /* list of active labels */ +} Dyndata; + + +/* control of blocks */ +struct BlockCnt; /* defined in lparser.c */ + + +/* state needed to generate code for a given function */ +typedef struct FuncState { + Proto *f; /* current function header */ + Table *h; /* table to find (and reuse) elements in `k' */ + struct FuncState *prev; /* enclosing function */ + struct LexState *ls; /* lexical state */ + struct BlockCnt *bl; /* chain of current blocks */ + int pc; /* next position to code (equivalent to `ncode') */ + int lasttarget; /* 'label' of last 'jump label' */ + int jpc; /* list of pending jumps to `pc' */ + int nk; /* number of elements in `k' */ + int np; /* number of elements in `p' */ + int firstlocal; /* index of first local var (in Dyndata array) */ + short nlocvars; /* number of elements in 'f->locvars' */ + lu_byte nactvar; /* number of active local variables */ + lu_byte nups; /* number of upvalues */ + lu_byte freereg; /* first free register */ +} FuncState; + + +LUAI_FUNC Closure *luaY_parser (lua_State *L, ZIO *z, Mbuffer *buff, + Dyndata *dyd, const char *name, int firstchar); + + +#endif diff --git a/ext/lua/includes/lstate.h b/ext/lua/includes/lstate.h new file mode 100644 index 000000000..c8a31f5c0 --- /dev/null +++ b/ext/lua/includes/lstate.h @@ -0,0 +1,228 @@ +/* +** $Id: lstate.h,v 2.82 2012/07/02 13:37:04 roberto Exp $ +** Global State +** See Copyright Notice in lua.h +*/ + +#ifndef lstate_h +#define lstate_h + +#include "lua.h" + +#include "lobject.h" +#include "ltm.h" +#include "lzio.h" + + +/* + +** Some notes about garbage-collected objects: All objects in Lua must +** be kept somehow accessible until being freed. +** +** Lua keeps most objects linked in list g->allgc. The link uses field +** 'next' of the CommonHeader. +** +** Strings are kept in several lists headed by the array g->strt.hash. +** +** Open upvalues are not subject to independent garbage collection. They +** are collected together with their respective threads. Lua keeps a +** double-linked list with all open upvalues (g->uvhead) so that it can +** mark objects referred by them. (They are always gray, so they must +** be remarked in the atomic step. Usually their contents would be marked +** when traversing the respective threads, but the thread may already be +** dead, while the upvalue is still accessible through closures.) +** +** Objects with finalizers are kept in the list g->finobj. +** +** The list g->tobefnz links all objects being finalized. + +*/ + + +struct lua_longjmp; /* defined in ldo.c */ + + + +/* extra stack space to handle TM calls and some other extras */ +#define EXTRA_STACK 5 + + +#define BASIC_STACK_SIZE (2*LUA_MINSTACK) + + +/* kinds of Garbage Collection */ +#define KGC_NORMAL 0 +#define KGC_EMERGENCY 1 /* gc was forced by an allocation failure */ +#define KGC_GEN 2 /* generational collection */ + + +typedef struct stringtable { + GCObject **hash; + lu_int32 nuse; /* number of elements */ + int size; +} stringtable; + + +/* +** information about a call +*/ +typedef struct CallInfo { + StkId func; /* function index in the stack */ + StkId top; /* top for this function */ + struct CallInfo *previous, *next; /* dynamic call link */ + short nresults; /* expected number of results from this function */ + lu_byte callstatus; + ptrdiff_t extra; + union { + struct { /* only for Lua functions */ + StkId base; /* base for this function */ + const Instruction *savedpc; + } l; + struct { /* only for C functions */ + int ctx; /* context info. in case of yields */ + lua_CFunction k; /* continuation in case of yields */ + ptrdiff_t old_errfunc; + lu_byte old_allowhook; + lu_byte status; + } c; + } u; +} CallInfo; + + +/* +** Bits in CallInfo status +*/ +#define CIST_LUA (1<<0) /* call is running a Lua function */ +#define CIST_HOOKED (1<<1) /* call is running a debug hook */ +#define CIST_REENTRY (1<<2) /* call is running on same invocation of + luaV_execute of previous call */ +#define CIST_YIELDED (1<<3) /* call reentered after suspension */ +#define CIST_YPCALL (1<<4) /* call is a yieldable protected call */ +#define CIST_STAT (1<<5) /* call has an error status (pcall) */ +#define CIST_TAIL (1<<6) /* call was tail called */ +#define CIST_HOOKYIELD (1<<7) /* last hook called yielded */ + + +#define isLua(ci) ((ci)->callstatus & CIST_LUA) + + +/* +** `global state', shared by all threads of this state +*/ +typedef struct global_State { + lua_Alloc frealloc; /* function to reallocate memory */ + void *ud; /* auxiliary data to `frealloc' */ + lu_mem totalbytes; /* number of bytes currently allocated - GCdebt */ + l_mem GCdebt; /* bytes allocated not yet compensated by the collector */ + lu_mem GCmemtrav; /* memory traversed by the GC */ + lu_mem GCestimate; /* an estimate of the non-garbage memory in use */ + stringtable strt; /* hash table for strings */ + TValue l_registry; + unsigned int seed; /* randomized seed for hashes */ + lu_byte currentwhite; + lu_byte gcstate; /* state of garbage collector */ + lu_byte gckind; /* kind of GC running */ + lu_byte gcrunning; /* true if GC is running */ + int sweepstrgc; /* position of sweep in `strt' */ + GCObject *allgc; /* list of all collectable objects */ + GCObject *finobj; /* list of collectable objects with finalizers */ + GCObject **sweepgc; /* current position of sweep in list 'allgc' */ + GCObject **sweepfin; /* current position of sweep in list 'finobj' */ + GCObject *gray; /* list of gray objects */ + GCObject *grayagain; /* list of objects to be traversed atomically */ + GCObject *weak; /* list of tables with weak values */ + GCObject *ephemeron; /* list of ephemeron tables (weak keys) */ + GCObject *allweak; /* list of all-weak tables */ + GCObject *tobefnz; /* list of userdata to be GC */ + UpVal uvhead; /* head of double-linked list of all open upvalues */ + Mbuffer buff; /* temporary buffer for string concatenation */ + int gcpause; /* size of pause between successive GCs */ + int gcmajorinc; /* pause between major collections (only in gen. mode) */ + int gcstepmul; /* GC `granularity' */ + lua_CFunction panic; /* to be called in unprotected errors */ + struct lua_State *mainthread; + const lua_Number *version; /* pointer to version number */ + TString *memerrmsg; /* memory-error message */ + TString *tmname[TM_N]; /* array with tag-method names */ + struct Table *mt[LUA_NUMTAGS]; /* metatables for basic types */ +} global_State; + + +/* +** `per thread' state +*/ +struct lua_State { + CommonHeader; + lu_byte status; + StkId top; /* first free slot in the stack */ + global_State *l_G; + CallInfo *ci; /* call info for current function */ + const Instruction *oldpc; /* last pc traced */ + StkId stack_last; /* last free slot in the stack */ + StkId stack; /* stack base */ + int stacksize; + unsigned short nny; /* number of non-yieldable calls in stack */ + unsigned short nCcalls; /* number of nested C calls */ + lu_byte hookmask; + lu_byte allowhook; + int basehookcount; + int hookcount; + lua_Hook hook; + GCObject *openupval; /* list of open upvalues in this stack */ + GCObject *gclist; + struct lua_longjmp *errorJmp; /* current error recover point */ + ptrdiff_t errfunc; /* current error handling function (stack index) */ + CallInfo base_ci; /* CallInfo for first level (C calling Lua) */ +}; + + +#define G(L) (L->l_G) + + +/* +** Union of all collectable objects +*/ +union GCObject { + GCheader gch; /* common header */ + union TString ts; + union Udata u; + union Closure cl; + struct Table h; + struct Proto p; + struct UpVal uv; + struct lua_State th; /* thread */ +}; + + +#define gch(o) (&(o)->gch) + +/* macros to convert a GCObject into a specific value */ +#define rawgco2ts(o) \ + check_exp(novariant((o)->gch.tt) == LUA_TSTRING, &((o)->ts)) +#define gco2ts(o) (&rawgco2ts(o)->tsv) +#define rawgco2u(o) check_exp((o)->gch.tt == LUA_TUSERDATA, &((o)->u)) +#define gco2u(o) (&rawgco2u(o)->uv) +#define gco2lcl(o) check_exp((o)->gch.tt == LUA_TLCL, &((o)->cl.l)) +#define gco2ccl(o) check_exp((o)->gch.tt == LUA_TCCL, &((o)->cl.c)) +#define gco2cl(o) \ + check_exp(novariant((o)->gch.tt) == LUA_TFUNCTION, &((o)->cl)) +#define gco2t(o) check_exp((o)->gch.tt == LUA_TTABLE, &((o)->h)) +#define gco2p(o) check_exp((o)->gch.tt == LUA_TPROTO, &((o)->p)) +#define gco2uv(o) check_exp((o)->gch.tt == LUA_TUPVAL, &((o)->uv)) +#define gco2th(o) check_exp((o)->gch.tt == LUA_TTHREAD, &((o)->th)) + +/* macro to convert any Lua object into a GCObject */ +#define obj2gco(v) (cast(GCObject *, (v))) + + +/* actual number of total bytes allocated */ +#define gettotalbytes(g) ((g)->totalbytes + (g)->GCdebt) + +LUAI_FUNC void luaE_setdebt (global_State *g, l_mem debt); +LUAI_FUNC void luaE_freethread (lua_State *L, lua_State *L1); +LUAI_FUNC CallInfo *luaE_extendCI (lua_State *L); +LUAI_FUNC void luaE_freeCI (lua_State *L); + + +#endif + diff --git a/ext/lua/includes/lstring.h b/ext/lua/includes/lstring.h new file mode 100644 index 000000000..d312ff3d2 --- /dev/null +++ b/ext/lua/includes/lstring.h @@ -0,0 +1,46 @@ +/* +** $Id: lstring.h,v 1.49 2012/02/01 21:57:15 roberto Exp $ +** String table (keep all strings handled by Lua) +** See Copyright Notice in lua.h +*/ + +#ifndef lstring_h +#define lstring_h + +#include "lgc.h" +#include "lobject.h" +#include "lstate.h" + + +#define sizestring(s) (sizeof(union TString)+((s)->len+1)*sizeof(char)) + +#define sizeudata(u) (sizeof(union Udata)+(u)->len) + +#define luaS_newliteral(L, s) (luaS_newlstr(L, "" s, \ + (sizeof(s)/sizeof(char))-1)) + +#define luaS_fix(s) l_setbit((s)->tsv.marked, FIXEDBIT) + + +/* +** test whether a string is a reserved word +*/ +#define isreserved(s) ((s)->tsv.tt == LUA_TSHRSTR && (s)->tsv.extra > 0) + + +/* +** equality for short strings, which are always internalized +*/ +#define eqshrstr(a,b) check_exp((a)->tsv.tt == LUA_TSHRSTR, (a) == (b)) + + +LUAI_FUNC unsigned int luaS_hash (const char *str, size_t l, unsigned int seed); +LUAI_FUNC int luaS_eqlngstr (TString *a, TString *b); +LUAI_FUNC int luaS_eqstr (TString *a, TString *b); +LUAI_FUNC void luaS_resize (lua_State *L, int newsize); +LUAI_FUNC Udata *luaS_newudata (lua_State *L, size_t s, Table *e); +LUAI_FUNC TString *luaS_newlstr (lua_State *L, const char *str, size_t l); +LUAI_FUNC TString *luaS_new (lua_State *L, const char *str); + + +#endif diff --git a/ext/lua/includes/ltable.h b/ext/lua/includes/ltable.h new file mode 100644 index 000000000..2f6f5c2dc --- /dev/null +++ b/ext/lua/includes/ltable.h @@ -0,0 +1,41 @@ +/* +** $Id: ltable.h,v 2.16 2011/08/17 20:26:47 roberto Exp $ +** Lua tables (hash) +** See Copyright Notice in lua.h +*/ + +#ifndef ltable_h +#define ltable_h + +#include "lobject.h" + + +#define gnode(t,i) (&(t)->node[i]) +#define gkey(n) (&(n)->i_key.tvk) +#define gval(n) (&(n)->i_val) +#define gnext(n) ((n)->i_key.nk.next) + +#define invalidateTMcache(t) ((t)->flags = 0) + + +LUAI_FUNC const TValue *luaH_getint (Table *t, int key); +LUAI_FUNC void luaH_setint (lua_State *L, Table *t, int key, TValue *value); +LUAI_FUNC const TValue *luaH_getstr (Table *t, TString *key); +LUAI_FUNC const TValue *luaH_get (Table *t, const TValue *key); +LUAI_FUNC TValue *luaH_newkey (lua_State *L, Table *t, const TValue *key); +LUAI_FUNC TValue *luaH_set (lua_State *L, Table *t, const TValue *key); +LUAI_FUNC Table *luaH_new (lua_State *L); +LUAI_FUNC void luaH_resize (lua_State *L, Table *t, int nasize, int nhsize); +LUAI_FUNC void luaH_resizearray (lua_State *L, Table *t, int nasize); +LUAI_FUNC void luaH_free (lua_State *L, Table *t); +LUAI_FUNC int luaH_next (lua_State *L, Table *t, StkId key); +LUAI_FUNC int luaH_getn (Table *t); + + +#if defined(LUA_DEBUG) +LUAI_FUNC Node *luaH_mainposition (const Table *t, const TValue *key); +LUAI_FUNC int luaH_isdummy (Node *n); +#endif + + +#endif diff --git a/ext/lua/includes/ltm.h b/ext/lua/includes/ltm.h new file mode 100644 index 000000000..89bdc19a1 --- /dev/null +++ b/ext/lua/includes/ltm.h @@ -0,0 +1,57 @@ +/* +** $Id: ltm.h,v 2.11 2011/02/28 17:32:10 roberto Exp $ +** Tag methods +** See Copyright Notice in lua.h +*/ + +#ifndef ltm_h +#define ltm_h + + +#include "lobject.h" + + +/* +* WARNING: if you change the order of this enumeration, +* grep "ORDER TM" +*/ +typedef enum { + TM_INDEX, + TM_NEWINDEX, + TM_GC, + TM_MODE, + TM_LEN, + TM_EQ, /* last tag method with `fast' access */ + TM_ADD, + TM_SUB, + TM_MUL, + TM_DIV, + TM_MOD, + TM_POW, + TM_UNM, + TM_LT, + TM_LE, + TM_CONCAT, + TM_CALL, + TM_N /* number of elements in the enum */ +} TMS; + + + +#define gfasttm(g,et,e) ((et) == NULL ? NULL : \ + ((et)->flags & (1u<<(e))) ? NULL : luaT_gettm(et, e, (g)->tmname[e])) + +#define fasttm(l,et,e) gfasttm(G(l), et, e) + +#define ttypename(x) luaT_typenames_[(x) + 1] +#define objtypename(x) ttypename(ttypenv(x)) + +LUAI_DDEC const char *const luaT_typenames_[LUA_TOTALTAGS]; + + +LUAI_FUNC const TValue *luaT_gettm (Table *events, TMS event, TString *ename); +LUAI_FUNC const TValue *luaT_gettmbyobj (lua_State *L, const TValue *o, + TMS event); +LUAI_FUNC void luaT_init (lua_State *L); + +#endif diff --git a/ext/lua/includes/lua.h b/ext/lua/includes/lua.h new file mode 100644 index 000000000..eb0482b8f --- /dev/null +++ b/ext/lua/includes/lua.h @@ -0,0 +1,444 @@ +/* +** $Id: lua.h,v 1.285 2013/03/15 13:04:22 roberto Exp $ +** Lua - A Scripting Language +** Lua.org, PUC-Rio, Brazil (http://www.lua.org) +** See Copyright Notice at the end of this file +*/ + + +#ifndef lua_h +#define lua_h + +#include +#include + + +#include "luaconf.h" + + +#define LUA_VERSION_MAJOR "5" +#define LUA_VERSION_MINOR "2" +#define LUA_VERSION_NUM 502 +#define LUA_VERSION_RELEASE "2" + +#define LUA_VERSION "Lua " LUA_VERSION_MAJOR "." LUA_VERSION_MINOR +#define LUA_RELEASE LUA_VERSION "." LUA_VERSION_RELEASE +#define LUA_COPYRIGHT LUA_RELEASE " Copyright (C) 1994-2013 Lua.org, PUC-Rio" +#define LUA_AUTHORS "R. Ierusalimschy, L. H. de Figueiredo, W. Celes" + + +/* mark for precompiled code ('Lua') */ +#define LUA_SIGNATURE "\033Lua" + +/* option for multiple returns in 'lua_pcall' and 'lua_call' */ +#define LUA_MULTRET (-1) + + +/* +** pseudo-indices +*/ +#define LUA_REGISTRYINDEX LUAI_FIRSTPSEUDOIDX +#define lua_upvalueindex(i) (LUA_REGISTRYINDEX - (i)) + + +/* thread status */ +#define LUA_OK 0 +#define LUA_YIELD 1 +#define LUA_ERRRUN 2 +#define LUA_ERRSYNTAX 3 +#define LUA_ERRMEM 4 +#define LUA_ERRGCMM 5 +#define LUA_ERRERR 6 + + +typedef struct lua_State lua_State; + +typedef int (*lua_CFunction) (lua_State *L); + + +/* +** functions that read/write blocks when loading/dumping Lua chunks +*/ +typedef const char * (*lua_Reader) (lua_State *L, void *ud, size_t *sz); + +typedef int (*lua_Writer) (lua_State *L, const void* p, size_t sz, void* ud); + + +/* +** prototype for memory-allocation functions +*/ +typedef void * (*lua_Alloc) (void *ud, void *ptr, size_t osize, size_t nsize); + + +/* +** basic types +*/ +#define LUA_TNONE (-1) + +#define LUA_TNIL 0 +#define LUA_TBOOLEAN 1 +#define LUA_TLIGHTUSERDATA 2 +#define LUA_TNUMBER 3 +#define LUA_TSTRING 4 +#define LUA_TTABLE 5 +#define LUA_TFUNCTION 6 +#define LUA_TUSERDATA 7 +#define LUA_TTHREAD 8 + +#define LUA_NUMTAGS 9 + + + +/* minimum Lua stack available to a C function */ +#define LUA_MINSTACK 20 + + +/* predefined values in the registry */ +#define LUA_RIDX_MAINTHREAD 1 +#define LUA_RIDX_GLOBALS 2 +#define LUA_RIDX_LAST LUA_RIDX_GLOBALS + + +/* type of numbers in Lua */ +typedef LUA_NUMBER lua_Number; + + +/* type for integer functions */ +typedef LUA_INTEGER lua_Integer; + +/* unsigned integer type */ +typedef LUA_UNSIGNED lua_Unsigned; + + + +/* +** generic extra include file +*/ +#if defined(LUA_USER_H) +#include LUA_USER_H +#endif + + +/* +** RCS ident string +*/ +extern const char lua_ident[]; + + +/* +** state manipulation +*/ +LUA_API lua_State *(lua_newstate) (lua_Alloc f, void *ud); +LUA_API void (lua_close) (lua_State *L); +LUA_API lua_State *(lua_newthread) (lua_State *L); + +LUA_API lua_CFunction (lua_atpanic) (lua_State *L, lua_CFunction panicf); + + +LUA_API const lua_Number *(lua_version) (lua_State *L); + + +/* +** basic stack manipulation +*/ +LUA_API int (lua_absindex) (lua_State *L, int idx); +LUA_API int (lua_gettop) (lua_State *L); +LUA_API void (lua_settop) (lua_State *L, int idx); +LUA_API void (lua_pushvalue) (lua_State *L, int idx); +LUA_API void (lua_remove) (lua_State *L, int idx); +LUA_API void (lua_insert) (lua_State *L, int idx); +LUA_API void (lua_replace) (lua_State *L, int idx); +LUA_API void (lua_copy) (lua_State *L, int fromidx, int toidx); +LUA_API int (lua_checkstack) (lua_State *L, int sz); + +LUA_API void (lua_xmove) (lua_State *from, lua_State *to, int n); + + +/* +** access functions (stack -> C) +*/ + +LUA_API int (lua_isnumber) (lua_State *L, int idx); +LUA_API int (lua_isstring) (lua_State *L, int idx); +LUA_API int (lua_iscfunction) (lua_State *L, int idx); +LUA_API int (lua_isuserdata) (lua_State *L, int idx); +LUA_API int (lua_type) (lua_State *L, int idx); +LUA_API const char *(lua_typename) (lua_State *L, int tp); + +LUA_API lua_Number (lua_tonumberx) (lua_State *L, int idx, int *isnum); +LUA_API lua_Integer (lua_tointegerx) (lua_State *L, int idx, int *isnum); +LUA_API lua_Unsigned (lua_tounsignedx) (lua_State *L, int idx, int *isnum); +LUA_API int (lua_toboolean) (lua_State *L, int idx); +LUA_API const char *(lua_tolstring) (lua_State *L, int idx, size_t *len); +LUA_API size_t (lua_rawlen) (lua_State *L, int idx); +LUA_API lua_CFunction (lua_tocfunction) (lua_State *L, int idx); +LUA_API void *(lua_touserdata) (lua_State *L, int idx); +LUA_API lua_State *(lua_tothread) (lua_State *L, int idx); +LUA_API const void *(lua_topointer) (lua_State *L, int idx); + + +/* +** Comparison and arithmetic functions +*/ + +#define LUA_OPADD 0 /* ORDER TM */ +#define LUA_OPSUB 1 +#define LUA_OPMUL 2 +#define LUA_OPDIV 3 +#define LUA_OPMOD 4 +#define LUA_OPPOW 5 +#define LUA_OPUNM 6 + +LUA_API void (lua_arith) (lua_State *L, int op); + +#define LUA_OPEQ 0 +#define LUA_OPLT 1 +#define LUA_OPLE 2 + +LUA_API int (lua_rawequal) (lua_State *L, int idx1, int idx2); +LUA_API int (lua_compare) (lua_State *L, int idx1, int idx2, int op); + + +/* +** push functions (C -> stack) +*/ +LUA_API void (lua_pushnil) (lua_State *L); +LUA_API void (lua_pushnumber) (lua_State *L, lua_Number n); +LUA_API void (lua_pushinteger) (lua_State *L, lua_Integer n); +LUA_API void (lua_pushunsigned) (lua_State *L, lua_Unsigned n); +LUA_API const char *(lua_pushlstring) (lua_State *L, const char *s, size_t l); +LUA_API const char *(lua_pushstring) (lua_State *L, const char *s); +LUA_API const char *(lua_pushvfstring) (lua_State *L, const char *fmt, + va_list argp); +LUA_API const char *(lua_pushfstring) (lua_State *L, const char *fmt, ...); +LUA_API void (lua_pushcclosure) (lua_State *L, lua_CFunction fn, int n); +LUA_API void (lua_pushboolean) (lua_State *L, int b); +LUA_API void (lua_pushlightuserdata) (lua_State *L, void *p); +LUA_API int (lua_pushthread) (lua_State *L); + + +/* +** get functions (Lua -> stack) +*/ +LUA_API void (lua_getglobal) (lua_State *L, const char *var); +LUA_API void (lua_gettable) (lua_State *L, int idx); +LUA_API void (lua_getfield) (lua_State *L, int idx, const char *k); +LUA_API void (lua_rawget) (lua_State *L, int idx); +LUA_API void (lua_rawgeti) (lua_State *L, int idx, int n); +LUA_API void (lua_rawgetp) (lua_State *L, int idx, const void *p); +LUA_API void (lua_createtable) (lua_State *L, int narr, int nrec); +LUA_API void *(lua_newuserdata) (lua_State *L, size_t sz); +LUA_API int (lua_getmetatable) (lua_State *L, int objindex); +LUA_API void (lua_getuservalue) (lua_State *L, int idx); + + +/* +** set functions (stack -> Lua) +*/ +LUA_API void (lua_setglobal) (lua_State *L, const char *var); +LUA_API void (lua_settable) (lua_State *L, int idx); +LUA_API void (lua_setfield) (lua_State *L, int idx, const char *k); +LUA_API void (lua_rawset) (lua_State *L, int idx); +LUA_API void (lua_rawseti) (lua_State *L, int idx, int n); +LUA_API void (lua_rawsetp) (lua_State *L, int idx, const void *p); +LUA_API int (lua_setmetatable) (lua_State *L, int objindex); +LUA_API void (lua_setuservalue) (lua_State *L, int idx); + + +/* +** 'load' and 'call' functions (load and run Lua code) +*/ +LUA_API void (lua_callk) (lua_State *L, int nargs, int nresults, int ctx, + lua_CFunction k); +#define lua_call(L,n,r) lua_callk(L, (n), (r), 0, NULL) + +LUA_API int (lua_getctx) (lua_State *L, int *ctx); + +LUA_API int (lua_pcallk) (lua_State *L, int nargs, int nresults, int errfunc, + int ctx, lua_CFunction k); +#define lua_pcall(L,n,r,f) lua_pcallk(L, (n), (r), (f), 0, NULL) + +LUA_API int (lua_load) (lua_State *L, lua_Reader reader, void *dt, + const char *chunkname, + const char *mode); + +LUA_API int (lua_dump) (lua_State *L, lua_Writer writer, void *data); + + +/* +** coroutine functions +*/ +LUA_API int (lua_yieldk) (lua_State *L, int nresults, int ctx, + lua_CFunction k); +#define lua_yield(L,n) lua_yieldk(L, (n), 0, NULL) +LUA_API int (lua_resume) (lua_State *L, lua_State *from, int narg); +LUA_API int (lua_status) (lua_State *L); + +/* +** garbage-collection function and options +*/ + +#define LUA_GCSTOP 0 +#define LUA_GCRESTART 1 +#define LUA_GCCOLLECT 2 +#define LUA_GCCOUNT 3 +#define LUA_GCCOUNTB 4 +#define LUA_GCSTEP 5 +#define LUA_GCSETPAUSE 6 +#define LUA_GCSETSTEPMUL 7 +#define LUA_GCSETMAJORINC 8 +#define LUA_GCISRUNNING 9 +#define LUA_GCGEN 10 +#define LUA_GCINC 11 + +LUA_API int (lua_gc) (lua_State *L, int what, int data); + + +/* +** miscellaneous functions +*/ + +LUA_API int (lua_error) (lua_State *L); + +LUA_API int (lua_next) (lua_State *L, int idx); + +LUA_API void (lua_concat) (lua_State *L, int n); +LUA_API void (lua_len) (lua_State *L, int idx); + +LUA_API lua_Alloc (lua_getallocf) (lua_State *L, void **ud); +LUA_API void (lua_setallocf) (lua_State *L, lua_Alloc f, void *ud); + + + +/* +** =============================================================== +** some useful macros +** =============================================================== +*/ + +#define lua_tonumber(L,i) lua_tonumberx(L,i,NULL) +#define lua_tointeger(L,i) lua_tointegerx(L,i,NULL) +#define lua_tounsigned(L,i) lua_tounsignedx(L,i,NULL) + +#define lua_pop(L,n) lua_settop(L, -(n)-1) + +#define lua_newtable(L) lua_createtable(L, 0, 0) + +#define lua_register(L,n,f) (lua_pushcfunction(L, (f)), lua_setglobal(L, (n))) + +#define lua_pushcfunction(L,f) lua_pushcclosure(L, (f), 0) + +#define lua_isfunction(L,n) (lua_type(L, (n)) == LUA_TFUNCTION) +#define lua_istable(L,n) (lua_type(L, (n)) == LUA_TTABLE) +#define lua_islightuserdata(L,n) (lua_type(L, (n)) == LUA_TLIGHTUSERDATA) +#define lua_isnil(L,n) (lua_type(L, (n)) == LUA_TNIL) +#define lua_isboolean(L,n) (lua_type(L, (n)) == LUA_TBOOLEAN) +#define lua_isthread(L,n) (lua_type(L, (n)) == LUA_TTHREAD) +#define lua_isnone(L,n) (lua_type(L, (n)) == LUA_TNONE) +#define lua_isnoneornil(L, n) (lua_type(L, (n)) <= 0) + +#define lua_pushliteral(L, s) \ + lua_pushlstring(L, "" s, (sizeof(s)/sizeof(char))-1) + +#define lua_pushglobaltable(L) \ + lua_rawgeti(L, LUA_REGISTRYINDEX, LUA_RIDX_GLOBALS) + +#define lua_tostring(L,i) lua_tolstring(L, (i), NULL) + + + +/* +** {====================================================================== +** Debug API +** ======================================================================= +*/ + + +/* +** Event codes +*/ +#define LUA_HOOKCALL 0 +#define LUA_HOOKRET 1 +#define LUA_HOOKLINE 2 +#define LUA_HOOKCOUNT 3 +#define LUA_HOOKTAILCALL 4 + + +/* +** Event masks +*/ +#define LUA_MASKCALL (1 << LUA_HOOKCALL) +#define LUA_MASKRET (1 << LUA_HOOKRET) +#define LUA_MASKLINE (1 << LUA_HOOKLINE) +#define LUA_MASKCOUNT (1 << LUA_HOOKCOUNT) + +typedef struct lua_Debug lua_Debug; /* activation record */ + + +/* Functions to be called by the debugger in specific events */ +typedef void (*lua_Hook) (lua_State *L, lua_Debug *ar); + + +LUA_API int (lua_getstack) (lua_State *L, int level, lua_Debug *ar); +LUA_API int (lua_getinfo) (lua_State *L, const char *what, lua_Debug *ar); +LUA_API const char *(lua_getlocal) (lua_State *L, const lua_Debug *ar, int n); +LUA_API const char *(lua_setlocal) (lua_State *L, const lua_Debug *ar, int n); +LUA_API const char *(lua_getupvalue) (lua_State *L, int funcindex, int n); +LUA_API const char *(lua_setupvalue) (lua_State *L, int funcindex, int n); + +LUA_API void *(lua_upvalueid) (lua_State *L, int fidx, int n); +LUA_API void (lua_upvaluejoin) (lua_State *L, int fidx1, int n1, + int fidx2, int n2); + +LUA_API int (lua_sethook) (lua_State *L, lua_Hook func, int mask, int count); +LUA_API lua_Hook (lua_gethook) (lua_State *L); +LUA_API int (lua_gethookmask) (lua_State *L); +LUA_API int (lua_gethookcount) (lua_State *L); + + +struct lua_Debug { + int event; + const char *name; /* (n) */ + const char *namewhat; /* (n) 'global', 'local', 'field', 'method' */ + const char *what; /* (S) 'Lua', 'C', 'main', 'tail' */ + const char *source; /* (S) */ + int currentline; /* (l) */ + int linedefined; /* (S) */ + int lastlinedefined; /* (S) */ + unsigned char nups; /* (u) number of upvalues */ + unsigned char nparams;/* (u) number of parameters */ + char isvararg; /* (u) */ + char istailcall; /* (t) */ + char short_src[LUA_IDSIZE]; /* (S) */ + /* private part */ + struct CallInfo *i_ci; /* active function */ +}; + +/* }====================================================================== */ + + +/****************************************************************************** +* Copyright (C) 1994-2013 Lua.org, PUC-Rio. +* +* Permission is hereby granted, free of charge, to any person obtaining +* a copy of this software and associated documentation files (the +* "Software"), to deal in the Software without restriction, including +* without limitation the rights to use, copy, modify, merge, publish, +* distribute, sublicense, and/or sell copies of the Software, and to +* permit persons to whom the Software is furnished to do so, subject to +* the following conditions: +* +* The above copyright notice and this permission notice shall be +* included in all copies or substantial portions of the Software. +* +* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY +* CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, +* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE +* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. +******************************************************************************/ + + +#endif diff --git a/ext/lua/includes/luaconf.h b/ext/lua/includes/luaconf.h new file mode 100644 index 000000000..df802c952 --- /dev/null +++ b/ext/lua/includes/luaconf.h @@ -0,0 +1,551 @@ +/* +** $Id: luaconf.h,v 1.176 2013/03/16 21:10:18 roberto Exp $ +** Configuration file for Lua +** See Copyright Notice in lua.h +*/ + + +#ifndef lconfig_h +#define lconfig_h + +#include +#include + + +/* +** ================================================================== +** Search for "@@" to find all configurable definitions. +** =================================================================== +*/ + + +/* +@@ LUA_ANSI controls the use of non-ansi features. +** CHANGE it (define it) if you want Lua to avoid the use of any +** non-ansi feature or library. +*/ +#if !defined(LUA_ANSI) && defined(__STRICT_ANSI__) +#define LUA_ANSI +#endif + + +#if !defined(LUA_ANSI) && defined(_WIN32) && !defined(_WIN32_WCE) +#define LUA_WIN /* enable goodies for regular Windows platforms */ +#endif + +#if defined(LUA_WIN) +#define LUA_DL_DLL +#define LUA_USE_AFORMAT /* assume 'printf' handles 'aA' specifiers */ +#endif + + + +#if defined(LUA_USE_LINUX) +#define LUA_USE_POSIX +#define LUA_USE_DLOPEN /* needs an extra library: -ldl */ +#define LUA_USE_READLINE /* needs some extra libraries */ +#define LUA_USE_STRTODHEX /* assume 'strtod' handles hex formats */ +#define LUA_USE_AFORMAT /* assume 'printf' handles 'aA' specifiers */ +#define LUA_USE_LONGLONG /* assume support for long long */ +#endif + +#if defined(LUA_USE_MACOSX) +#define LUA_USE_POSIX +#define LUA_USE_DLOPEN /* does not need -ldl */ +#define LUA_USE_READLINE /* needs an extra library: -lreadline */ +#define LUA_USE_STRTODHEX /* assume 'strtod' handles hex formats */ +#define LUA_USE_AFORMAT /* assume 'printf' handles 'aA' specifiers */ +#define LUA_USE_LONGLONG /* assume support for long long */ +#endif + + + +/* +@@ LUA_USE_POSIX includes all functionality listed as X/Open System +@* Interfaces Extension (XSI). +** CHANGE it (define it) if your system is XSI compatible. +*/ +#if defined(LUA_USE_POSIX) +#define LUA_USE_MKSTEMP +#define LUA_USE_ISATTY +#define LUA_USE_POPEN +#define LUA_USE_ULONGJMP +#define LUA_USE_GMTIME_R +#endif + + + +/* +@@ LUA_PATH_DEFAULT is the default path that Lua uses to look for +@* Lua libraries. +@@ LUA_CPATH_DEFAULT is the default path that Lua uses to look for +@* C libraries. +** CHANGE them if your machine has a non-conventional directory +** hierarchy or if you want to install your libraries in +** non-conventional directories. +*/ +#if defined(_WIN32) /* { */ +/* +** In Windows, any exclamation mark ('!') in the path is replaced by the +** path of the directory of the executable file of the current process. +*/ +#define LUA_LDIR "!\\lua\\" +#define LUA_CDIR "!\\" +#define LUA_PATH_DEFAULT \ + LUA_LDIR"?.lua;" LUA_LDIR"?\\init.lua;" \ + LUA_CDIR"?.lua;" LUA_CDIR"?\\init.lua;" ".\\?.lua" +#define LUA_CPATH_DEFAULT \ + LUA_CDIR"?.dll;" LUA_CDIR"loadall.dll;" ".\\?.dll" + +#else /* }{ */ + +#define LUA_VDIR LUA_VERSION_MAJOR "." LUA_VERSION_MINOR "/" +#define LUA_ROOT "/usr/local/" +#define LUA_LDIR LUA_ROOT "share/lua/" LUA_VDIR +#define LUA_CDIR LUA_ROOT "lib/lua/" LUA_VDIR +#define LUA_PATH_DEFAULT \ + LUA_LDIR"?.lua;" LUA_LDIR"?/init.lua;" \ + LUA_CDIR"?.lua;" LUA_CDIR"?/init.lua;" "./?.lua" +#define LUA_CPATH_DEFAULT \ + LUA_CDIR"?.so;" LUA_CDIR"loadall.so;" "./?.so" +#endif /* } */ + + +/* +@@ LUA_DIRSEP is the directory separator (for submodules). +** CHANGE it if your machine does not use "/" as the directory separator +** and is not Windows. (On Windows Lua automatically uses "\".) +*/ +#if defined(_WIN32) +#define LUA_DIRSEP "\\" +#else +#define LUA_DIRSEP "/" +#endif + + +/* +@@ LUA_ENV is the name of the variable that holds the current +@@ environment, used to access global names. +** CHANGE it if you do not like this name. +*/ +#define LUA_ENV "_ENV" + + +/* +@@ LUA_API is a mark for all core API functions. +@@ LUALIB_API is a mark for all auxiliary library functions. +@@ LUAMOD_API is a mark for all standard library opening functions. +** CHANGE them if you need to define those functions in some special way. +** For instance, if you want to create one Windows DLL with the core and +** the libraries, you may want to use the following definition (define +** LUA_BUILD_AS_DLL to get it). +*/ +#if defined(LUA_BUILD_AS_DLL) /* { */ + +#if defined(LUA_CORE) || defined(LUA_LIB) /* { */ +#define LUA_API __declspec(dllexport) +#else /* }{ */ +#define LUA_API __declspec(dllimport) +#endif /* } */ + +#else /* }{ */ + +#define LUA_API extern + +#endif /* } */ + + +/* more often than not the libs go together with the core */ +#define LUALIB_API LUA_API +#define LUAMOD_API LUALIB_API + + +/* +@@ LUAI_FUNC is a mark for all extern functions that are not to be +@* exported to outside modules. +@@ LUAI_DDEF and LUAI_DDEC are marks for all extern (const) variables +@* that are not to be exported to outside modules (LUAI_DDEF for +@* definitions and LUAI_DDEC for declarations). +** CHANGE them if you need to mark them in some special way. Elf/gcc +** (versions 3.2 and later) mark them as "hidden" to optimize access +** when Lua is compiled as a shared library. Not all elf targets support +** this attribute. Unfortunately, gcc does not offer a way to check +** whether the target offers that support, and those without support +** give a warning about it. To avoid these warnings, change to the +** default definition. +*/ +#if defined(__GNUC__) && ((__GNUC__*100 + __GNUC_MINOR__) >= 302) && \ + defined(__ELF__) /* { */ +#define LUAI_FUNC __attribute__((visibility("hidden"))) extern +#define LUAI_DDEC LUAI_FUNC +#define LUAI_DDEF /* empty */ + +#else /* }{ */ +#define LUAI_FUNC extern +#define LUAI_DDEC extern +#define LUAI_DDEF /* empty */ +#endif /* } */ + + + +/* +@@ LUA_QL describes how error messages quote program elements. +** CHANGE it if you want a different appearance. +*/ +#define LUA_QL(x) "'" x "'" +#define LUA_QS LUA_QL("%s") + + +/* +@@ LUA_IDSIZE gives the maximum size for the description of the source +@* of a function in debug information. +** CHANGE it if you want a different size. +*/ +#define LUA_IDSIZE 60 + + +/* +@@ luai_writestring/luai_writeline define how 'print' prints its results. +** They are only used in libraries and the stand-alone program. (The #if +** avoids including 'stdio.h' everywhere.) +*/ +#if defined(LUA_LIB) || defined(lua_c) +#include +#define luai_writestring(s,l) fwrite((s), sizeof(char), (l), stdout) +#define luai_writeline() (luai_writestring("\n", 1), fflush(stdout)) +#endif + +/* +@@ luai_writestringerror defines how to print error messages. +** (A format string with one argument is enough for Lua...) +*/ +#define luai_writestringerror(s,p) \ + (fprintf(stderr, (s), (p)), fflush(stderr)) + + +/* +@@ LUAI_MAXSHORTLEN is the maximum length for short strings, that is, +** strings that are internalized. (Cannot be smaller than reserved words +** or tags for metamethods, as these strings must be internalized; +** #("function") = 8, #("__newindex") = 10.) +*/ +#define LUAI_MAXSHORTLEN 40 + + + +/* +** {================================================================== +** Compatibility with previous versions +** =================================================================== +*/ + +/* +@@ LUA_COMPAT_ALL controls all compatibility options. +** You can define it to get all options, or change specific options +** to fit your specific needs. +*/ +#if defined(LUA_COMPAT_ALL) /* { */ + +/* +@@ LUA_COMPAT_UNPACK controls the presence of global 'unpack'. +** You can replace it with 'table.unpack'. +*/ +#define LUA_COMPAT_UNPACK + +/* +@@ LUA_COMPAT_LOADERS controls the presence of table 'package.loaders'. +** You can replace it with 'package.searchers'. +*/ +#define LUA_COMPAT_LOADERS + +/* +@@ macro 'lua_cpcall' emulates deprecated function lua_cpcall. +** You can call your C function directly (with light C functions). +*/ +#define lua_cpcall(L,f,u) \ + (lua_pushcfunction(L, (f)), \ + lua_pushlightuserdata(L,(u)), \ + lua_pcall(L,1,0,0)) + + +/* +@@ LUA_COMPAT_LOG10 defines the function 'log10' in the math library. +** You can rewrite 'log10(x)' as 'log(x, 10)'. +*/ +#define LUA_COMPAT_LOG10 + +/* +@@ LUA_COMPAT_LOADSTRING defines the function 'loadstring' in the base +** library. You can rewrite 'loadstring(s)' as 'load(s)'. +*/ +#define LUA_COMPAT_LOADSTRING + +/* +@@ LUA_COMPAT_MAXN defines the function 'maxn' in the table library. +*/ +#define LUA_COMPAT_MAXN + +/* +@@ The following macros supply trivial compatibility for some +** changes in the API. The macros themselves document how to +** change your code to avoid using them. +*/ +#define lua_strlen(L,i) lua_rawlen(L, (i)) + +#define lua_objlen(L,i) lua_rawlen(L, (i)) + +#define lua_equal(L,idx1,idx2) lua_compare(L,(idx1),(idx2),LUA_OPEQ) +#define lua_lessthan(L,idx1,idx2) lua_compare(L,(idx1),(idx2),LUA_OPLT) + +/* +@@ LUA_COMPAT_MODULE controls compatibility with previous +** module functions 'module' (Lua) and 'luaL_register' (C). +*/ +#define LUA_COMPAT_MODULE + +#endif /* } */ + +/* }================================================================== */ + + + +/* +@@ LUAI_BITSINT defines the number of bits in an int. +** CHANGE here if Lua cannot automatically detect the number of bits of +** your machine. Probably you do not need to change this. +*/ +/* avoid overflows in comparison */ +#if INT_MAX-20 < 32760 /* { */ +#define LUAI_BITSINT 16 +#elif INT_MAX > 2147483640L /* }{ */ +/* int has at least 32 bits */ +#define LUAI_BITSINT 32 +#else /* }{ */ +#error "you must define LUA_BITSINT with number of bits in an integer" +#endif /* } */ + + +/* +@@ LUA_INT32 is an signed integer with exactly 32 bits. +@@ LUAI_UMEM is an unsigned integer big enough to count the total +@* memory used by Lua. +@@ LUAI_MEM is a signed integer big enough to count the total memory +@* used by Lua. +** CHANGE here if for some weird reason the default definitions are not +** good enough for your machine. Probably you do not need to change +** this. +*/ +#if LUAI_BITSINT >= 32 /* { */ +#define LUA_INT32 int +#define LUAI_UMEM size_t +#define LUAI_MEM ptrdiff_t +#else /* }{ */ +/* 16-bit ints */ +#define LUA_INT32 long +#define LUAI_UMEM unsigned long +#define LUAI_MEM long +#endif /* } */ + + +/* +@@ LUAI_MAXSTACK limits the size of the Lua stack. +** CHANGE it if you need a different limit. This limit is arbitrary; +** its only purpose is to stop Lua to consume unlimited stack +** space (and to reserve some numbers for pseudo-indices). +*/ +#if LUAI_BITSINT >= 32 +#define LUAI_MAXSTACK 1000000 +#else +#define LUAI_MAXSTACK 15000 +#endif + +/* reserve some space for error handling */ +#define LUAI_FIRSTPSEUDOIDX (-LUAI_MAXSTACK - 1000) + + + + +/* +@@ LUAL_BUFFERSIZE is the buffer size used by the lauxlib buffer system. +** CHANGE it if it uses too much C-stack space. +*/ +#define LUAL_BUFFERSIZE BUFSIZ + + + + +/* +** {================================================================== +@@ LUA_NUMBER is the type of numbers in Lua. +** CHANGE the following definitions only if you want to build Lua +** with a number type different from double. You may also need to +** change lua_number2int & lua_number2integer. +** =================================================================== +*/ + +#define LUA_NUMBER_DOUBLE +#define LUA_NUMBER double + +/* +@@ LUAI_UACNUMBER is the result of an 'usual argument conversion' +@* over a number. +*/ +#define LUAI_UACNUMBER double + + +/* +@@ LUA_NUMBER_SCAN is the format for reading numbers. +@@ LUA_NUMBER_FMT is the format for writing numbers. +@@ lua_number2str converts a number to a string. +@@ LUAI_MAXNUMBER2STR is maximum size of previous conversion. +*/ +#define LUA_NUMBER_SCAN "%lf" +#define LUA_NUMBER_FMT "%.14g" +#define lua_number2str(s,n) sprintf((s), LUA_NUMBER_FMT, (n)) +#define LUAI_MAXNUMBER2STR 32 /* 16 digits, sign, point, and \0 */ + + +/* +@@ l_mathop allows the addition of an 'l' or 'f' to all math operations +*/ +#define l_mathop(x) (x) + + +/* +@@ lua_str2number converts a decimal numeric string to a number. +@@ lua_strx2number converts an hexadecimal numeric string to a number. +** In C99, 'strtod' does both conversions. C89, however, has no function +** to convert floating hexadecimal strings to numbers. For these +** systems, you can leave 'lua_strx2number' undefined and Lua will +** provide its own implementation. +*/ +#define lua_str2number(s,p) strtod((s), (p)) + +#if defined(LUA_USE_STRTODHEX) +#define lua_strx2number(s,p) strtod((s), (p)) +#endif + + +/* +@@ The luai_num* macros define the primitive operations over numbers. +*/ + +/* the following operations need the math library */ +#if defined(lobject_c) || defined(lvm_c) +#include +#define luai_nummod(L,a,b) ((a) - l_mathop(floor)((a)/(b))*(b)) +#define luai_numpow(L,a,b) (l_mathop(pow)(a,b)) +#endif + +/* these are quite standard operations */ +#if defined(LUA_CORE) +#define luai_numadd(L,a,b) ((a)+(b)) +#define luai_numsub(L,a,b) ((a)-(b)) +#define luai_nummul(L,a,b) ((a)*(b)) +#define luai_numdiv(L,a,b) ((a)/(b)) +#define luai_numunm(L,a) (-(a)) +#define luai_numeq(a,b) ((a)==(b)) +#define luai_numlt(L,a,b) ((a)<(b)) +#define luai_numle(L,a,b) ((a)<=(b)) +#define luai_numisnan(L,a) (!luai_numeq((a), (a))) +#endif + + + +/* +@@ LUA_INTEGER is the integral type used by lua_pushinteger/lua_tointeger. +** CHANGE that if ptrdiff_t is not adequate on your machine. (On most +** machines, ptrdiff_t gives a good choice between int or long.) +*/ +#define LUA_INTEGER ptrdiff_t + +/* +@@ LUA_UNSIGNED is the integral type used by lua_pushunsigned/lua_tounsigned. +** It must have at least 32 bits. +*/ +#define LUA_UNSIGNED unsigned LUA_INT32 + + + +/* +** Some tricks with doubles +*/ + +#if defined(LUA_NUMBER_DOUBLE) && !defined(LUA_ANSI) /* { */ +/* +** The next definitions activate some tricks to speed up the +** conversion from doubles to integer types, mainly to LUA_UNSIGNED. +** +@@ LUA_MSASMTRICK uses Microsoft assembler to avoid clashes with a +** DirectX idiosyncrasy. +** +@@ LUA_IEEE754TRICK uses a trick that should work on any machine +** using IEEE754 with a 32-bit integer type. +** +@@ LUA_IEEELL extends the trick to LUA_INTEGER; should only be +** defined when LUA_INTEGER is a 32-bit integer. +** +@@ LUA_IEEEENDIAN is the endianness of doubles in your machine +** (0 for little endian, 1 for big endian); if not defined, Lua will +** check it dynamically for LUA_IEEE754TRICK (but not for LUA_NANTRICK). +** +@@ LUA_NANTRICK controls the use of a trick to pack all types into +** a single double value, using NaN values to represent non-number +** values. The trick only works on 32-bit machines (ints and pointers +** are 32-bit values) with numbers represented as IEEE 754-2008 doubles +** with conventional endianess (12345678 or 87654321), in CPUs that do +** not produce signaling NaN values (all NaNs are quiet). +*/ + +/* Microsoft compiler on a Pentium (32 bit) ? */ +#if defined(LUA_WIN) && defined(_MSC_VER) && defined(_M_IX86) /* { */ + +#define LUA_MSASMTRICK +#define LUA_IEEEENDIAN 0 +#define LUA_NANTRICK + + +/* pentium 32 bits? */ +#elif defined(__i386__) || defined(__i386) || defined(__X86__) /* }{ */ + +#define LUA_IEEE754TRICK +#define LUA_IEEELL +#define LUA_IEEEENDIAN 0 +#define LUA_NANTRICK + +/* pentium 64 bits? */ +#elif defined(__x86_64) /* }{ */ + +#define LUA_IEEE754TRICK +#define LUA_IEEEENDIAN 0 + +#elif defined(__POWERPC__) || defined(__ppc__) /* }{ */ + +#define LUA_IEEE754TRICK +#define LUA_IEEEENDIAN 1 + +#else /* }{ */ + +/* assume IEEE754 and a 32-bit integer type */ +#define LUA_IEEE754TRICK + +#endif /* } */ + +#endif /* } */ + +/* }================================================================== */ + + + + +/* =================================================================== */ + +/* +** Local configuration. You can use this space to add your redefinitions +** without modifying the main part of the file. +*/ + + + +#endif + diff --git a/ext/lua/includes/lualib.h b/ext/lua/includes/lualib.h new file mode 100644 index 000000000..9fd126bf7 --- /dev/null +++ b/ext/lua/includes/lualib.h @@ -0,0 +1,55 @@ +/* +** $Id: lualib.h,v 1.43 2011/12/08 12:11:37 roberto Exp $ +** Lua standard libraries +** See Copyright Notice in lua.h +*/ + + +#ifndef lualib_h +#define lualib_h + +#include "lua.h" + + + +LUAMOD_API int (luaopen_base) (lua_State *L); + +#define LUA_COLIBNAME "coroutine" +LUAMOD_API int (luaopen_coroutine) (lua_State *L); + +#define LUA_TABLIBNAME "table" +LUAMOD_API int (luaopen_table) (lua_State *L); + +#define LUA_IOLIBNAME "io" +LUAMOD_API int (luaopen_io) (lua_State *L); + +#define LUA_OSLIBNAME "os" +LUAMOD_API int (luaopen_os) (lua_State *L); + +#define LUA_STRLIBNAME "string" +LUAMOD_API int (luaopen_string) (lua_State *L); + +#define LUA_BITLIBNAME "bit32" +LUAMOD_API int (luaopen_bit32) (lua_State *L); + +#define LUA_MATHLIBNAME "math" +LUAMOD_API int (luaopen_math) (lua_State *L); + +#define LUA_DBLIBNAME "debug" +LUAMOD_API int (luaopen_debug) (lua_State *L); + +#define LUA_LOADLIBNAME "package" +LUAMOD_API int (luaopen_package) (lua_State *L); + + +/* open all previous libraries */ +LUALIB_API void (luaL_openlibs) (lua_State *L); + + + +#if !defined(lua_assert) +#define lua_assert(x) ((void)0) +#endif + + +#endif diff --git a/ext/lua/includes/lundump.h b/ext/lua/includes/lundump.h new file mode 100644 index 000000000..2b8accecb --- /dev/null +++ b/ext/lua/includes/lundump.h @@ -0,0 +1,28 @@ +/* +** $Id: lundump.h,v 1.39 2012/05/08 13:53:33 roberto Exp $ +** load precompiled Lua chunks +** See Copyright Notice in lua.h +*/ + +#ifndef lundump_h +#define lundump_h + +#include "lobject.h" +#include "lzio.h" + +/* load one chunk; from lundump.c */ +LUAI_FUNC Closure* luaU_undump (lua_State* L, ZIO* Z, Mbuffer* buff, const char* name); + +/* make header; from lundump.c */ +LUAI_FUNC void luaU_header (lu_byte* h); + +/* dump one chunk; from ldump.c */ +LUAI_FUNC int luaU_dump (lua_State* L, const Proto* f, lua_Writer w, void* data, int strip); + +/* data to catch conversion errors */ +#define LUAC_TAIL "\x19\x93\r\n\x1a\n" + +/* size in bytes of header of binary files */ +#define LUAC_HEADERSIZE (sizeof(LUA_SIGNATURE)-sizeof(char)+2+6+sizeof(LUAC_TAIL)-sizeof(char)) + +#endif diff --git a/ext/lua/includes/lvm.h b/ext/lua/includes/lvm.h new file mode 100644 index 000000000..07e25f9c6 --- /dev/null +++ b/ext/lua/includes/lvm.h @@ -0,0 +1,44 @@ +/* +** $Id: lvm.h,v 2.18 2013/01/08 14:06:55 roberto Exp $ +** Lua virtual machine +** See Copyright Notice in lua.h +*/ + +#ifndef lvm_h +#define lvm_h + + +#include "ldo.h" +#include "lobject.h" +#include "ltm.h" + + +#define tostring(L,o) (ttisstring(o) || (luaV_tostring(L, o))) + +#define tonumber(o,n) (ttisnumber(o) || (((o) = luaV_tonumber(o,n)) != NULL)) + +#define equalobj(L,o1,o2) (ttisequal(o1, o2) && luaV_equalobj_(L, o1, o2)) + +#define luaV_rawequalobj(o1,o2) equalobj(NULL,o1,o2) + + +/* not to called directly */ +LUAI_FUNC int luaV_equalobj_ (lua_State *L, const TValue *t1, const TValue *t2); + + +LUAI_FUNC int luaV_lessthan (lua_State *L, const TValue *l, const TValue *r); +LUAI_FUNC int luaV_lessequal (lua_State *L, const TValue *l, const TValue *r); +LUAI_FUNC const TValue *luaV_tonumber (const TValue *obj, TValue *n); +LUAI_FUNC int luaV_tostring (lua_State *L, StkId obj); +LUAI_FUNC void luaV_gettable (lua_State *L, const TValue *t, TValue *key, + StkId val); +LUAI_FUNC void luaV_settable (lua_State *L, const TValue *t, TValue *key, + StkId val); +LUAI_FUNC void luaV_finishOp (lua_State *L); +LUAI_FUNC void luaV_execute (lua_State *L); +LUAI_FUNC void luaV_concat (lua_State *L, int total); +LUAI_FUNC void luaV_arith (lua_State *L, StkId ra, const TValue *rb, + const TValue *rc, TMS op); +LUAI_FUNC void luaV_objlen (lua_State *L, StkId ra, const TValue *rb); + +#endif diff --git a/ext/lua/includes/lzio.h b/ext/lua/includes/lzio.h new file mode 100644 index 000000000..08682301e --- /dev/null +++ b/ext/lua/includes/lzio.h @@ -0,0 +1,65 @@ +/* +** $Id: lzio.h,v 1.26 2011/07/15 12:48:03 roberto Exp $ +** Buffered streams +** See Copyright Notice in lua.h +*/ + + +#ifndef lzio_h +#define lzio_h + +#include "lua.h" + +#include "lmem.h" + + +#define EOZ (-1) /* end of stream */ + +typedef struct Zio ZIO; + +#define zgetc(z) (((z)->n--)>0 ? cast_uchar(*(z)->p++) : luaZ_fill(z)) + + +typedef struct Mbuffer { + char *buffer; + size_t n; + size_t buffsize; +} Mbuffer; + +#define luaZ_initbuffer(L, buff) ((buff)->buffer = NULL, (buff)->buffsize = 0) + +#define luaZ_buffer(buff) ((buff)->buffer) +#define luaZ_sizebuffer(buff) ((buff)->buffsize) +#define luaZ_bufflen(buff) ((buff)->n) + +#define luaZ_resetbuffer(buff) ((buff)->n = 0) + + +#define luaZ_resizebuffer(L, buff, size) \ + (luaM_reallocvector(L, (buff)->buffer, (buff)->buffsize, size, char), \ + (buff)->buffsize = size) + +#define luaZ_freebuffer(L, buff) luaZ_resizebuffer(L, buff, 0) + + +LUAI_FUNC char *luaZ_openspace (lua_State *L, Mbuffer *buff, size_t n); +LUAI_FUNC void luaZ_init (lua_State *L, ZIO *z, lua_Reader reader, + void *data); +LUAI_FUNC size_t luaZ_read (ZIO* z, void* b, size_t n); /* read next n bytes */ + + + +/* --------- Private Part ------------------ */ + +struct Zio { + size_t n; /* bytes still unread */ + const char *p; /* current position in buffer */ + lua_Reader reader; /* reader function */ + void* data; /* additional data */ + lua_State *L; /* Lua state (for reader) */ +}; + + +LUAI_FUNC int luaZ_fill (ZIO *z); + +#endif diff --git a/ext/lua/src/lapi.c b/ext/lua/src/lapi.c new file mode 100644 index 000000000..791d85454 --- /dev/null +++ b/ext/lua/src/lapi.c @@ -0,0 +1,1284 @@ +/* +** $Id: lapi.c,v 2.171 2013/03/16 21:10:18 roberto Exp $ +** Lua API +** See Copyright Notice in lua.h +*/ + + +#include +#include + +#define lapi_c +#define LUA_CORE + +#include "lua.h" + +#include "lapi.h" +#include "ldebug.h" +#include "ldo.h" +#include "lfunc.h" +#include "lgc.h" +#include "lmem.h" +#include "lobject.h" +#include "lstate.h" +#include "lstring.h" +#include "ltable.h" +#include "ltm.h" +#include "lundump.h" +#include "lvm.h" + + + +const char lua_ident[] = + "$LuaVersion: " LUA_COPYRIGHT " $" + "$LuaAuthors: " LUA_AUTHORS " $"; + + +/* value at a non-valid index */ +#define NONVALIDVALUE cast(TValue *, luaO_nilobject) + +/* corresponding test */ +#define isvalid(o) ((o) != luaO_nilobject) + +/* test for pseudo index */ +#define ispseudo(i) ((i) <= LUA_REGISTRYINDEX) + +/* test for valid but not pseudo index */ +#define isstackindex(i, o) (isvalid(o) && !ispseudo(i)) + +#define api_checkvalidindex(L, o) api_check(L, isvalid(o), "invalid index") + +#define api_checkstackindex(L, i, o) \ + api_check(L, isstackindex(i, o), "index not in the stack") + + +static TValue *index2addr (lua_State *L, int idx) { + CallInfo *ci = L->ci; + if (idx > 0) { + TValue *o = ci->func + idx; + api_check(L, idx <= ci->top - (ci->func + 1), "unacceptable index"); + if (o >= L->top) return NONVALIDVALUE; + else return o; + } + else if (!ispseudo(idx)) { /* negative index */ + api_check(L, idx != 0 && -idx <= L->top - (ci->func + 1), "invalid index"); + return L->top + idx; + } + else if (idx == LUA_REGISTRYINDEX) + return &G(L)->l_registry; + else { /* upvalues */ + idx = LUA_REGISTRYINDEX - idx; + api_check(L, idx <= MAXUPVAL + 1, "upvalue index too large"); + if (ttislcf(ci->func)) /* light C function? */ + return NONVALIDVALUE; /* it has no upvalues */ + else { + CClosure *func = clCvalue(ci->func); + return (idx <= func->nupvalues) ? &func->upvalue[idx-1] : NONVALIDVALUE; + } + } +} + + +/* +** to be called by 'lua_checkstack' in protected mode, to grow stack +** capturing memory errors +*/ +static void growstack (lua_State *L, void *ud) { + int size = *(int *)ud; + luaD_growstack(L, size); +} + + +LUA_API int lua_checkstack (lua_State *L, int size) { + int res; + CallInfo *ci = L->ci; + lua_lock(L); + if (L->stack_last - L->top > size) /* stack large enough? */ + res = 1; /* yes; check is OK */ + else { /* no; need to grow stack */ + int inuse = cast_int(L->top - L->stack) + EXTRA_STACK; + if (inuse > LUAI_MAXSTACK - size) /* can grow without overflow? */ + res = 0; /* no */ + else /* try to grow stack */ + res = (luaD_rawrunprotected(L, &growstack, &size) == LUA_OK); + } + if (res && ci->top < L->top + size) + ci->top = L->top + size; /* adjust frame top */ + lua_unlock(L); + return res; +} + + +LUA_API void lua_xmove (lua_State *from, lua_State *to, int n) { + int i; + if (from == to) return; + lua_lock(to); + api_checknelems(from, n); + api_check(from, G(from) == G(to), "moving among independent states"); + api_check(from, to->ci->top - to->top >= n, "not enough elements to move"); + from->top -= n; + for (i = 0; i < n; i++) { + setobj2s(to, to->top++, from->top + i); + } + lua_unlock(to); +} + + +LUA_API lua_CFunction lua_atpanic (lua_State *L, lua_CFunction panicf) { + lua_CFunction old; + lua_lock(L); + old = G(L)->panic; + G(L)->panic = panicf; + lua_unlock(L); + return old; +} + + +LUA_API const lua_Number *lua_version (lua_State *L) { + static const lua_Number version = LUA_VERSION_NUM; + if (L == NULL) return &version; + else return G(L)->version; +} + + + +/* +** basic stack manipulation +*/ + + +/* +** convert an acceptable stack index into an absolute index +*/ +LUA_API int lua_absindex (lua_State *L, int idx) { + return (idx > 0 || ispseudo(idx)) + ? idx + : cast_int(L->top - L->ci->func + idx); +} + + +LUA_API int lua_gettop (lua_State *L) { + return cast_int(L->top - (L->ci->func + 1)); +} + + +LUA_API void lua_settop (lua_State *L, int idx) { + StkId func = L->ci->func; + lua_lock(L); + if (idx >= 0) { + api_check(L, idx <= L->stack_last - (func + 1), "new top too large"); + while (L->top < (func + 1) + idx) + setnilvalue(L->top++); + L->top = (func + 1) + idx; + } + else { + api_check(L, -(idx+1) <= (L->top - (func + 1)), "invalid new top"); + L->top += idx+1; /* `subtract' index (index is negative) */ + } + lua_unlock(L); +} + + +LUA_API void lua_remove (lua_State *L, int idx) { + StkId p; + lua_lock(L); + p = index2addr(L, idx); + api_checkstackindex(L, idx, p); + while (++p < L->top) setobjs2s(L, p-1, p); + L->top--; + lua_unlock(L); +} + + +LUA_API void lua_insert (lua_State *L, int idx) { + StkId p; + StkId q; + lua_lock(L); + p = index2addr(L, idx); + api_checkstackindex(L, idx, p); + for (q = L->top; q > p; q--) /* use L->top as a temporary */ + setobjs2s(L, q, q - 1); + setobjs2s(L, p, L->top); + lua_unlock(L); +} + + +static void moveto (lua_State *L, TValue *fr, int idx) { + TValue *to = index2addr(L, idx); + api_checkvalidindex(L, to); + setobj(L, to, fr); + if (idx < LUA_REGISTRYINDEX) /* function upvalue? */ + luaC_barrier(L, clCvalue(L->ci->func), fr); + /* LUA_REGISTRYINDEX does not need gc barrier + (collector revisits it before finishing collection) */ +} + + +LUA_API void lua_replace (lua_State *L, int idx) { + lua_lock(L); + api_checknelems(L, 1); + moveto(L, L->top - 1, idx); + L->top--; + lua_unlock(L); +} + + +LUA_API void lua_copy (lua_State *L, int fromidx, int toidx) { + TValue *fr; + lua_lock(L); + fr = index2addr(L, fromidx); + moveto(L, fr, toidx); + lua_unlock(L); +} + + +LUA_API void lua_pushvalue (lua_State *L, int idx) { + lua_lock(L); + setobj2s(L, L->top, index2addr(L, idx)); + api_incr_top(L); + lua_unlock(L); +} + + + +/* +** access functions (stack -> C) +*/ + + +LUA_API int lua_type (lua_State *L, int idx) { + StkId o = index2addr(L, idx); + return (isvalid(o) ? ttypenv(o) : LUA_TNONE); +} + + +LUA_API const char *lua_typename (lua_State *L, int t) { + UNUSED(L); + return ttypename(t); +} + + +LUA_API int lua_iscfunction (lua_State *L, int idx) { + StkId o = index2addr(L, idx); + return (ttislcf(o) || (ttisCclosure(o))); +} + + +LUA_API int lua_isnumber (lua_State *L, int idx) { + TValue n; + const TValue *o = index2addr(L, idx); + return tonumber(o, &n); +} + + +LUA_API int lua_isstring (lua_State *L, int idx) { + int t = lua_type(L, idx); + return (t == LUA_TSTRING || t == LUA_TNUMBER); +} + + +LUA_API int lua_isuserdata (lua_State *L, int idx) { + const TValue *o = index2addr(L, idx); + return (ttisuserdata(o) || ttislightuserdata(o)); +} + + +LUA_API int lua_rawequal (lua_State *L, int index1, int index2) { + StkId o1 = index2addr(L, index1); + StkId o2 = index2addr(L, index2); + return (isvalid(o1) && isvalid(o2)) ? luaV_rawequalobj(o1, o2) : 0; +} + + +LUA_API void lua_arith (lua_State *L, int op) { + StkId o1; /* 1st operand */ + StkId o2; /* 2nd operand */ + lua_lock(L); + if (op != LUA_OPUNM) /* all other operations expect two operands */ + api_checknelems(L, 2); + else { /* for unary minus, add fake 2nd operand */ + api_checknelems(L, 1); + setobjs2s(L, L->top, L->top - 1); + L->top++; + } + o1 = L->top - 2; + o2 = L->top - 1; + if (ttisnumber(o1) && ttisnumber(o2)) { + setnvalue(o1, luaO_arith(op, nvalue(o1), nvalue(o2))); + } + else + luaV_arith(L, o1, o1, o2, cast(TMS, op - LUA_OPADD + TM_ADD)); + L->top--; + lua_unlock(L); +} + + +LUA_API int lua_compare (lua_State *L, int index1, int index2, int op) { + StkId o1, o2; + int i = 0; + lua_lock(L); /* may call tag method */ + o1 = index2addr(L, index1); + o2 = index2addr(L, index2); + if (isvalid(o1) && isvalid(o2)) { + switch (op) { + case LUA_OPEQ: i = equalobj(L, o1, o2); break; + case LUA_OPLT: i = luaV_lessthan(L, o1, o2); break; + case LUA_OPLE: i = luaV_lessequal(L, o1, o2); break; + default: api_check(L, 0, "invalid option"); + } + } + lua_unlock(L); + return i; +} + + +LUA_API lua_Number lua_tonumberx (lua_State *L, int idx, int *isnum) { + TValue n; + const TValue *o = index2addr(L, idx); + if (tonumber(o, &n)) { + if (isnum) *isnum = 1; + return nvalue(o); + } + else { + if (isnum) *isnum = 0; + return 0; + } +} + + +LUA_API lua_Integer lua_tointegerx (lua_State *L, int idx, int *isnum) { + TValue n; + const TValue *o = index2addr(L, idx); + if (tonumber(o, &n)) { + lua_Integer res; + lua_Number num = nvalue(o); + lua_number2integer(res, num); + if (isnum) *isnum = 1; + return res; + } + else { + if (isnum) *isnum = 0; + return 0; + } +} + + +LUA_API lua_Unsigned lua_tounsignedx (lua_State *L, int idx, int *isnum) { + TValue n; + const TValue *o = index2addr(L, idx); + if (tonumber(o, &n)) { + lua_Unsigned res; + lua_Number num = nvalue(o); + lua_number2unsigned(res, num); + if (isnum) *isnum = 1; + return res; + } + else { + if (isnum) *isnum = 0; + return 0; + } +} + + +LUA_API int lua_toboolean (lua_State *L, int idx) { + const TValue *o = index2addr(L, idx); + return !l_isfalse(o); +} + + +LUA_API const char *lua_tolstring (lua_State *L, int idx, size_t *len) { + StkId o = index2addr(L, idx); + if (!ttisstring(o)) { + lua_lock(L); /* `luaV_tostring' may create a new string */ + if (!luaV_tostring(L, o)) { /* conversion failed? */ + if (len != NULL) *len = 0; + lua_unlock(L); + return NULL; + } + luaC_checkGC(L); + o = index2addr(L, idx); /* previous call may reallocate the stack */ + lua_unlock(L); + } + if (len != NULL) *len = tsvalue(o)->len; + return svalue(o); +} + + +LUA_API size_t lua_rawlen (lua_State *L, int idx) { + StkId o = index2addr(L, idx); + switch (ttypenv(o)) { + case LUA_TSTRING: return tsvalue(o)->len; + case LUA_TUSERDATA: return uvalue(o)->len; + case LUA_TTABLE: return luaH_getn(hvalue(o)); + default: return 0; + } +} + + +LUA_API lua_CFunction lua_tocfunction (lua_State *L, int idx) { + StkId o = index2addr(L, idx); + if (ttislcf(o)) return fvalue(o); + else if (ttisCclosure(o)) + return clCvalue(o)->f; + else return NULL; /* not a C function */ +} + + +LUA_API void *lua_touserdata (lua_State *L, int idx) { + StkId o = index2addr(L, idx); + switch (ttypenv(o)) { + case LUA_TUSERDATA: return (rawuvalue(o) + 1); + case LUA_TLIGHTUSERDATA: return pvalue(o); + default: return NULL; + } +} + + +LUA_API lua_State *lua_tothread (lua_State *L, int idx) { + StkId o = index2addr(L, idx); + return (!ttisthread(o)) ? NULL : thvalue(o); +} + + +LUA_API const void *lua_topointer (lua_State *L, int idx) { + StkId o = index2addr(L, idx); + switch (ttype(o)) { + case LUA_TTABLE: return hvalue(o); + case LUA_TLCL: return clLvalue(o); + case LUA_TCCL: return clCvalue(o); + case LUA_TLCF: return cast(void *, cast(size_t, fvalue(o))); + case LUA_TTHREAD: return thvalue(o); + case LUA_TUSERDATA: + case LUA_TLIGHTUSERDATA: + return lua_touserdata(L, idx); + default: return NULL; + } +} + + + +/* +** push functions (C -> stack) +*/ + + +LUA_API void lua_pushnil (lua_State *L) { + lua_lock(L); + setnilvalue(L->top); + api_incr_top(L); + lua_unlock(L); +} + + +LUA_API void lua_pushnumber (lua_State *L, lua_Number n) { + lua_lock(L); + setnvalue(L->top, n); + luai_checknum(L, L->top, + luaG_runerror(L, "C API - attempt to push a signaling NaN")); + api_incr_top(L); + lua_unlock(L); +} + + +LUA_API void lua_pushinteger (lua_State *L, lua_Integer n) { + lua_lock(L); + setnvalue(L->top, cast_num(n)); + api_incr_top(L); + lua_unlock(L); +} + + +LUA_API void lua_pushunsigned (lua_State *L, lua_Unsigned u) { + lua_Number n; + lua_lock(L); + n = lua_unsigned2number(u); + setnvalue(L->top, n); + api_incr_top(L); + lua_unlock(L); +} + + +LUA_API const char *lua_pushlstring (lua_State *L, const char *s, size_t len) { + TString *ts; + lua_lock(L); + luaC_checkGC(L); + ts = luaS_newlstr(L, s, len); + setsvalue2s(L, L->top, ts); + api_incr_top(L); + lua_unlock(L); + return getstr(ts); +} + + +LUA_API const char *lua_pushstring (lua_State *L, const char *s) { + if (s == NULL) { + lua_pushnil(L); + return NULL; + } + else { + TString *ts; + lua_lock(L); + luaC_checkGC(L); + ts = luaS_new(L, s); + setsvalue2s(L, L->top, ts); + api_incr_top(L); + lua_unlock(L); + return getstr(ts); + } +} + + +LUA_API const char *lua_pushvfstring (lua_State *L, const char *fmt, + va_list argp) { + const char *ret; + lua_lock(L); + luaC_checkGC(L); + ret = luaO_pushvfstring(L, fmt, argp); + lua_unlock(L); + return ret; +} + + +LUA_API const char *lua_pushfstring (lua_State *L, const char *fmt, ...) { + const char *ret; + va_list argp; + lua_lock(L); + luaC_checkGC(L); + va_start(argp, fmt); + ret = luaO_pushvfstring(L, fmt, argp); + va_end(argp); + lua_unlock(L); + return ret; +} + + +LUA_API void lua_pushcclosure (lua_State *L, lua_CFunction fn, int n) { + lua_lock(L); + if (n == 0) { + setfvalue(L->top, fn); + } + else { + Closure *cl; + api_checknelems(L, n); + api_check(L, n <= MAXUPVAL, "upvalue index too large"); + luaC_checkGC(L); + cl = luaF_newCclosure(L, n); + cl->c.f = fn; + L->top -= n; + while (n--) + setobj2n(L, &cl->c.upvalue[n], L->top + n); + setclCvalue(L, L->top, cl); + } + api_incr_top(L); + lua_unlock(L); +} + + +LUA_API void lua_pushboolean (lua_State *L, int b) { + lua_lock(L); + setbvalue(L->top, (b != 0)); /* ensure that true is 1 */ + api_incr_top(L); + lua_unlock(L); +} + + +LUA_API void lua_pushlightuserdata (lua_State *L, void *p) { + lua_lock(L); + setpvalue(L->top, p); + api_incr_top(L); + lua_unlock(L); +} + + +LUA_API int lua_pushthread (lua_State *L) { + lua_lock(L); + setthvalue(L, L->top, L); + api_incr_top(L); + lua_unlock(L); + return (G(L)->mainthread == L); +} + + + +/* +** get functions (Lua -> stack) +*/ + + +LUA_API void lua_getglobal (lua_State *L, const char *var) { + Table *reg = hvalue(&G(L)->l_registry); + const TValue *gt; /* global table */ + lua_lock(L); + gt = luaH_getint(reg, LUA_RIDX_GLOBALS); + setsvalue2s(L, L->top++, luaS_new(L, var)); + luaV_gettable(L, gt, L->top - 1, L->top - 1); + lua_unlock(L); +} + + +LUA_API void lua_gettable (lua_State *L, int idx) { + StkId t; + lua_lock(L); + t = index2addr(L, idx); + luaV_gettable(L, t, L->top - 1, L->top - 1); + lua_unlock(L); +} + + +LUA_API void lua_getfield (lua_State *L, int idx, const char *k) { + StkId t; + lua_lock(L); + t = index2addr(L, idx); + setsvalue2s(L, L->top, luaS_new(L, k)); + api_incr_top(L); + luaV_gettable(L, t, L->top - 1, L->top - 1); + lua_unlock(L); +} + + +LUA_API void lua_rawget (lua_State *L, int idx) { + StkId t; + lua_lock(L); + t = index2addr(L, idx); + api_check(L, ttistable(t), "table expected"); + setobj2s(L, L->top - 1, luaH_get(hvalue(t), L->top - 1)); + lua_unlock(L); +} + + +LUA_API void lua_rawgeti (lua_State *L, int idx, int n) { + StkId t; + lua_lock(L); + t = index2addr(L, idx); + api_check(L, ttistable(t), "table expected"); + setobj2s(L, L->top, luaH_getint(hvalue(t), n)); + api_incr_top(L); + lua_unlock(L); +} + + +LUA_API void lua_rawgetp (lua_State *L, int idx, const void *p) { + StkId t; + TValue k; + lua_lock(L); + t = index2addr(L, idx); + api_check(L, ttistable(t), "table expected"); + setpvalue(&k, cast(void *, p)); + setobj2s(L, L->top, luaH_get(hvalue(t), &k)); + api_incr_top(L); + lua_unlock(L); +} + + +LUA_API void lua_createtable (lua_State *L, int narray, int nrec) { + Table *t; + lua_lock(L); + luaC_checkGC(L); + t = luaH_new(L); + sethvalue(L, L->top, t); + api_incr_top(L); + if (narray > 0 || nrec > 0) + luaH_resize(L, t, narray, nrec); + lua_unlock(L); +} + + +LUA_API int lua_getmetatable (lua_State *L, int objindex) { + const TValue *obj; + Table *mt = NULL; + int res; + lua_lock(L); + obj = index2addr(L, objindex); + switch (ttypenv(obj)) { + case LUA_TTABLE: + mt = hvalue(obj)->metatable; + break; + case LUA_TUSERDATA: + mt = uvalue(obj)->metatable; + break; + default: + mt = G(L)->mt[ttypenv(obj)]; + break; + } + if (mt == NULL) + res = 0; + else { + sethvalue(L, L->top, mt); + api_incr_top(L); + res = 1; + } + lua_unlock(L); + return res; +} + + +LUA_API void lua_getuservalue (lua_State *L, int idx) { + StkId o; + lua_lock(L); + o = index2addr(L, idx); + api_check(L, ttisuserdata(o), "userdata expected"); + if (uvalue(o)->env) { + sethvalue(L, L->top, uvalue(o)->env); + } else + setnilvalue(L->top); + api_incr_top(L); + lua_unlock(L); +} + + +/* +** set functions (stack -> Lua) +*/ + + +LUA_API void lua_setglobal (lua_State *L, const char *var) { + Table *reg = hvalue(&G(L)->l_registry); + const TValue *gt; /* global table */ + lua_lock(L); + api_checknelems(L, 1); + gt = luaH_getint(reg, LUA_RIDX_GLOBALS); + setsvalue2s(L, L->top++, luaS_new(L, var)); + luaV_settable(L, gt, L->top - 1, L->top - 2); + L->top -= 2; /* pop value and key */ + lua_unlock(L); +} + + +LUA_API void lua_settable (lua_State *L, int idx) { + StkId t; + lua_lock(L); + api_checknelems(L, 2); + t = index2addr(L, idx); + luaV_settable(L, t, L->top - 2, L->top - 1); + L->top -= 2; /* pop index and value */ + lua_unlock(L); +} + + +LUA_API void lua_setfield (lua_State *L, int idx, const char *k) { + StkId t; + lua_lock(L); + api_checknelems(L, 1); + t = index2addr(L, idx); + setsvalue2s(L, L->top++, luaS_new(L, k)); + luaV_settable(L, t, L->top - 1, L->top - 2); + L->top -= 2; /* pop value and key */ + lua_unlock(L); +} + + +LUA_API void lua_rawset (lua_State *L, int idx) { + StkId t; + lua_lock(L); + api_checknelems(L, 2); + t = index2addr(L, idx); + api_check(L, ttistable(t), "table expected"); + setobj2t(L, luaH_set(L, hvalue(t), L->top-2), L->top-1); + invalidateTMcache(hvalue(t)); + luaC_barrierback(L, gcvalue(t), L->top-1); + L->top -= 2; + lua_unlock(L); +} + + +LUA_API void lua_rawseti (lua_State *L, int idx, int n) { + StkId t; + lua_lock(L); + api_checknelems(L, 1); + t = index2addr(L, idx); + api_check(L, ttistable(t), "table expected"); + luaH_setint(L, hvalue(t), n, L->top - 1); + luaC_barrierback(L, gcvalue(t), L->top-1); + L->top--; + lua_unlock(L); +} + + +LUA_API void lua_rawsetp (lua_State *L, int idx, const void *p) { + StkId t; + TValue k; + lua_lock(L); + api_checknelems(L, 1); + t = index2addr(L, idx); + api_check(L, ttistable(t), "table expected"); + setpvalue(&k, cast(void *, p)); + setobj2t(L, luaH_set(L, hvalue(t), &k), L->top - 1); + luaC_barrierback(L, gcvalue(t), L->top - 1); + L->top--; + lua_unlock(L); +} + + +LUA_API int lua_setmetatable (lua_State *L, int objindex) { + TValue *obj; + Table *mt; + lua_lock(L); + api_checknelems(L, 1); + obj = index2addr(L, objindex); + if (ttisnil(L->top - 1)) + mt = NULL; + else { + api_check(L, ttistable(L->top - 1), "table expected"); + mt = hvalue(L->top - 1); + } + switch (ttypenv(obj)) { + case LUA_TTABLE: { + hvalue(obj)->metatable = mt; + if (mt) { + luaC_objbarrierback(L, gcvalue(obj), mt); + luaC_checkfinalizer(L, gcvalue(obj), mt); + } + break; + } + case LUA_TUSERDATA: { + uvalue(obj)->metatable = mt; + if (mt) { + luaC_objbarrier(L, rawuvalue(obj), mt); + luaC_checkfinalizer(L, gcvalue(obj), mt); + } + break; + } + default: { + G(L)->mt[ttypenv(obj)] = mt; + break; + } + } + L->top--; + lua_unlock(L); + return 1; +} + + +LUA_API void lua_setuservalue (lua_State *L, int idx) { + StkId o; + lua_lock(L); + api_checknelems(L, 1); + o = index2addr(L, idx); + api_check(L, ttisuserdata(o), "userdata expected"); + if (ttisnil(L->top - 1)) + uvalue(o)->env = NULL; + else { + api_check(L, ttistable(L->top - 1), "table expected"); + uvalue(o)->env = hvalue(L->top - 1); + luaC_objbarrier(L, gcvalue(o), hvalue(L->top - 1)); + } + L->top--; + lua_unlock(L); +} + + +/* +** `load' and `call' functions (run Lua code) +*/ + + +#define checkresults(L,na,nr) \ + api_check(L, (nr) == LUA_MULTRET || (L->ci->top - L->top >= (nr) - (na)), \ + "results from function overflow current stack size") + + +LUA_API int lua_getctx (lua_State *L, int *ctx) { + if (L->ci->callstatus & CIST_YIELDED) { + if (ctx) *ctx = L->ci->u.c.ctx; + return L->ci->u.c.status; + } + else return LUA_OK; +} + + +LUA_API void lua_callk (lua_State *L, int nargs, int nresults, int ctx, + lua_CFunction k) { + StkId func; + lua_lock(L); + api_check(L, k == NULL || !isLua(L->ci), + "cannot use continuations inside hooks"); + api_checknelems(L, nargs+1); + api_check(L, L->status == LUA_OK, "cannot do calls on non-normal thread"); + checkresults(L, nargs, nresults); + func = L->top - (nargs+1); + if (k != NULL && L->nny == 0) { /* need to prepare continuation? */ + L->ci->u.c.k = k; /* save continuation */ + L->ci->u.c.ctx = ctx; /* save context */ + luaD_call(L, func, nresults, 1); /* do the call */ + } + else /* no continuation or no yieldable */ + luaD_call(L, func, nresults, 0); /* just do the call */ + adjustresults(L, nresults); + lua_unlock(L); +} + + + +/* +** Execute a protected call. +*/ +struct CallS { /* data to `f_call' */ + StkId func; + int nresults; +}; + + +static void f_call (lua_State *L, void *ud) { + struct CallS *c = cast(struct CallS *, ud); + luaD_call(L, c->func, c->nresults, 0); +} + + + +LUA_API int lua_pcallk (lua_State *L, int nargs, int nresults, int errfunc, + int ctx, lua_CFunction k) { + struct CallS c; + int status; + ptrdiff_t func; + lua_lock(L); + api_check(L, k == NULL || !isLua(L->ci), + "cannot use continuations inside hooks"); + api_checknelems(L, nargs+1); + api_check(L, L->status == LUA_OK, "cannot do calls on non-normal thread"); + checkresults(L, nargs, nresults); + if (errfunc == 0) + func = 0; + else { + StkId o = index2addr(L, errfunc); + api_checkstackindex(L, errfunc, o); + func = savestack(L, o); + } + c.func = L->top - (nargs+1); /* function to be called */ + if (k == NULL || L->nny > 0) { /* no continuation or no yieldable? */ + c.nresults = nresults; /* do a 'conventional' protected call */ + status = luaD_pcall(L, f_call, &c, savestack(L, c.func), func); + } + else { /* prepare continuation (call is already protected by 'resume') */ + CallInfo *ci = L->ci; + ci->u.c.k = k; /* save continuation */ + ci->u.c.ctx = ctx; /* save context */ + /* save information for error recovery */ + ci->extra = savestack(L, c.func); + ci->u.c.old_allowhook = L->allowhook; + ci->u.c.old_errfunc = L->errfunc; + L->errfunc = func; + /* mark that function may do error recovery */ + ci->callstatus |= CIST_YPCALL; + luaD_call(L, c.func, nresults, 1); /* do the call */ + ci->callstatus &= ~CIST_YPCALL; + L->errfunc = ci->u.c.old_errfunc; + status = LUA_OK; /* if it is here, there were no errors */ + } + adjustresults(L, nresults); + lua_unlock(L); + return status; +} + + +LUA_API int lua_load (lua_State *L, lua_Reader reader, void *data, + const char *chunkname, const char *mode) { + ZIO z; + int status; + lua_lock(L); + if (!chunkname) chunkname = "?"; + luaZ_init(L, &z, reader, data); + status = luaD_protectedparser(L, &z, chunkname, mode); + if (status == LUA_OK) { /* no errors? */ + LClosure *f = clLvalue(L->top - 1); /* get newly created function */ + if (f->nupvalues == 1) { /* does it have one upvalue? */ + /* get global table from registry */ + Table *reg = hvalue(&G(L)->l_registry); + const TValue *gt = luaH_getint(reg, LUA_RIDX_GLOBALS); + /* set global table as 1st upvalue of 'f' (may be LUA_ENV) */ + setobj(L, f->upvals[0]->v, gt); + luaC_barrier(L, f->upvals[0], gt); + } + } + lua_unlock(L); + return status; +} + + +LUA_API int lua_dump (lua_State *L, lua_Writer writer, void *data) { + int status; + TValue *o; + lua_lock(L); + api_checknelems(L, 1); + o = L->top - 1; + if (isLfunction(o)) + status = luaU_dump(L, getproto(o), writer, data, 0); + else + status = 1; + lua_unlock(L); + return status; +} + + +LUA_API int lua_status (lua_State *L) { + return L->status; +} + + +/* +** Garbage-collection function +*/ + +LUA_API int lua_gc (lua_State *L, int what, int data) { + int res = 0; + global_State *g; + lua_lock(L); + g = G(L); + switch (what) { + case LUA_GCSTOP: { + g->gcrunning = 0; + break; + } + case LUA_GCRESTART: { + luaE_setdebt(g, 0); + g->gcrunning = 1; + break; + } + case LUA_GCCOLLECT: { + luaC_fullgc(L, 0); + break; + } + case LUA_GCCOUNT: { + /* GC values are expressed in Kbytes: #bytes/2^10 */ + res = cast_int(gettotalbytes(g) >> 10); + break; + } + case LUA_GCCOUNTB: { + res = cast_int(gettotalbytes(g) & 0x3ff); + break; + } + case LUA_GCSTEP: { + if (g->gckind == KGC_GEN) { /* generational mode? */ + res = (g->GCestimate == 0); /* true if it will do major collection */ + luaC_forcestep(L); /* do a single step */ + } + else { + lu_mem debt = cast(lu_mem, data) * 1024 - GCSTEPSIZE; + if (g->gcrunning) + debt += g->GCdebt; /* include current debt */ + luaE_setdebt(g, debt); + luaC_forcestep(L); + if (g->gcstate == GCSpause) /* end of cycle? */ + res = 1; /* signal it */ + } + break; + } + case LUA_GCSETPAUSE: { + res = g->gcpause; + g->gcpause = data; + break; + } + case LUA_GCSETMAJORINC: { + res = g->gcmajorinc; + g->gcmajorinc = data; + break; + } + case LUA_GCSETSTEPMUL: { + res = g->gcstepmul; + g->gcstepmul = data; + break; + } + case LUA_GCISRUNNING: { + res = g->gcrunning; + break; + } + case LUA_GCGEN: { /* change collector to generational mode */ + luaC_changemode(L, KGC_GEN); + break; + } + case LUA_GCINC: { /* change collector to incremental mode */ + luaC_changemode(L, KGC_NORMAL); + break; + } + default: res = -1; /* invalid option */ + } + lua_unlock(L); + return res; +} + + + +/* +** miscellaneous functions +*/ + + +LUA_API int lua_error (lua_State *L) { + lua_lock(L); + api_checknelems(L, 1); + luaG_errormsg(L); + /* code unreachable; will unlock when control actually leaves the kernel */ + return 0; /* to avoid warnings */ +} + + +LUA_API int lua_next (lua_State *L, int idx) { + StkId t; + int more; + lua_lock(L); + t = index2addr(L, idx); + api_check(L, ttistable(t), "table expected"); + more = luaH_next(L, hvalue(t), L->top - 1); + if (more) { + api_incr_top(L); + } + else /* no more elements */ + L->top -= 1; /* remove key */ + lua_unlock(L); + return more; +} + + +LUA_API void lua_concat (lua_State *L, int n) { + lua_lock(L); + api_checknelems(L, n); + if (n >= 2) { + luaC_checkGC(L); + luaV_concat(L, n); + } + else if (n == 0) { /* push empty string */ + setsvalue2s(L, L->top, luaS_newlstr(L, "", 0)); + api_incr_top(L); + } + /* else n == 1; nothing to do */ + lua_unlock(L); +} + + +LUA_API void lua_len (lua_State *L, int idx) { + StkId t; + lua_lock(L); + t = index2addr(L, idx); + luaV_objlen(L, L->top, t); + api_incr_top(L); + lua_unlock(L); +} + + +LUA_API lua_Alloc lua_getallocf (lua_State *L, void **ud) { + lua_Alloc f; + lua_lock(L); + if (ud) *ud = G(L)->ud; + f = G(L)->frealloc; + lua_unlock(L); + return f; +} + + +LUA_API void lua_setallocf (lua_State *L, lua_Alloc f, void *ud) { + lua_lock(L); + G(L)->ud = ud; + G(L)->frealloc = f; + lua_unlock(L); +} + + +LUA_API void *lua_newuserdata (lua_State *L, size_t size) { + Udata *u; + lua_lock(L); + luaC_checkGC(L); + u = luaS_newudata(L, size, NULL); + setuvalue(L, L->top, u); + api_incr_top(L); + lua_unlock(L); + return u + 1; +} + + + +static const char *aux_upvalue (StkId fi, int n, TValue **val, + GCObject **owner) { + switch (ttype(fi)) { + case LUA_TCCL: { /* C closure */ + CClosure *f = clCvalue(fi); + if (!(1 <= n && n <= f->nupvalues)) return NULL; + *val = &f->upvalue[n-1]; + if (owner) *owner = obj2gco(f); + return ""; + } + case LUA_TLCL: { /* Lua closure */ + LClosure *f = clLvalue(fi); + TString *name; + Proto *p = f->p; + if (!(1 <= n && n <= p->sizeupvalues)) return NULL; + *val = f->upvals[n-1]->v; + if (owner) *owner = obj2gco(f->upvals[n - 1]); + name = p->upvalues[n-1].name; + return (name == NULL) ? "" : getstr(name); + } + default: return NULL; /* not a closure */ + } +} + + +LUA_API const char *lua_getupvalue (lua_State *L, int funcindex, int n) { + const char *name; + TValue *val = NULL; /* to avoid warnings */ + lua_lock(L); + name = aux_upvalue(index2addr(L, funcindex), n, &val, NULL); + if (name) { + setobj2s(L, L->top, val); + api_incr_top(L); + } + lua_unlock(L); + return name; +} + + +LUA_API const char *lua_setupvalue (lua_State *L, int funcindex, int n) { + const char *name; + TValue *val = NULL; /* to avoid warnings */ + GCObject *owner = NULL; /* to avoid warnings */ + StkId fi; + lua_lock(L); + fi = index2addr(L, funcindex); + api_checknelems(L, 1); + name = aux_upvalue(fi, n, &val, &owner); + if (name) { + L->top--; + setobj(L, val, L->top); + luaC_barrier(L, owner, L->top); + } + lua_unlock(L); + return name; +} + + +static UpVal **getupvalref (lua_State *L, int fidx, int n, LClosure **pf) { + LClosure *f; + StkId fi = index2addr(L, fidx); + api_check(L, ttisLclosure(fi), "Lua function expected"); + f = clLvalue(fi); + api_check(L, (1 <= n && n <= f->p->sizeupvalues), "invalid upvalue index"); + if (pf) *pf = f; + return &f->upvals[n - 1]; /* get its upvalue pointer */ +} + + +LUA_API void *lua_upvalueid (lua_State *L, int fidx, int n) { + StkId fi = index2addr(L, fidx); + switch (ttype(fi)) { + case LUA_TLCL: { /* lua closure */ + return *getupvalref(L, fidx, n, NULL); + } + case LUA_TCCL: { /* C closure */ + CClosure *f = clCvalue(fi); + api_check(L, 1 <= n && n <= f->nupvalues, "invalid upvalue index"); + return &f->upvalue[n - 1]; + } + default: { + api_check(L, 0, "closure expected"); + return NULL; + } + } +} + + +LUA_API void lua_upvaluejoin (lua_State *L, int fidx1, int n1, + int fidx2, int n2) { + LClosure *f1; + UpVal **up1 = getupvalref(L, fidx1, n1, &f1); + UpVal **up2 = getupvalref(L, fidx2, n2, NULL); + *up1 = *up2; + luaC_objbarrier(L, f1, *up2); +} + diff --git a/ext/lua/src/lauxlib.c b/ext/lua/src/lauxlib.c new file mode 100644 index 000000000..2e989d661 --- /dev/null +++ b/ext/lua/src/lauxlib.c @@ -0,0 +1,959 @@ +/* +** $Id: lauxlib.c,v 1.248 2013/03/21 13:54:57 roberto Exp $ +** Auxiliary functions for building Lua libraries +** See Copyright Notice in lua.h +*/ + + +#include +#include +#include +#include +#include + + +/* This file uses only the official API of Lua. +** Any function declared here could be written as an application function. +*/ + +#define lauxlib_c +#define LUA_LIB + +#include "lua.h" + +#include "lauxlib.h" + + +/* +** {====================================================== +** Traceback +** ======================================================= +*/ + + +#define LEVELS1 12 /* size of the first part of the stack */ +#define LEVELS2 10 /* size of the second part of the stack */ + + + +/* +** search for 'objidx' in table at index -1. +** return 1 + string at top if find a good name. +*/ +static int findfield (lua_State *L, int objidx, int level) { + if (level == 0 || !lua_istable(L, -1)) + return 0; /* not found */ + lua_pushnil(L); /* start 'next' loop */ + while (lua_next(L, -2)) { /* for each pair in table */ + if (lua_type(L, -2) == LUA_TSTRING) { /* ignore non-string keys */ + if (lua_rawequal(L, objidx, -1)) { /* found object? */ + lua_pop(L, 1); /* remove value (but keep name) */ + return 1; + } + else if (findfield(L, objidx, level - 1)) { /* try recursively */ + lua_remove(L, -2); /* remove table (but keep name) */ + lua_pushliteral(L, "."); + lua_insert(L, -2); /* place '.' between the two names */ + lua_concat(L, 3); + return 1; + } + } + lua_pop(L, 1); /* remove value */ + } + return 0; /* not found */ +} + + +static int pushglobalfuncname (lua_State *L, lua_Debug *ar) { + int top = lua_gettop(L); + lua_getinfo(L, "f", ar); /* push function */ + lua_pushglobaltable(L); + if (findfield(L, top + 1, 2)) { + lua_copy(L, -1, top + 1); /* move name to proper place */ + lua_pop(L, 2); /* remove pushed values */ + return 1; + } + else { + lua_settop(L, top); /* remove function and global table */ + return 0; + } +} + + +static void pushfuncname (lua_State *L, lua_Debug *ar) { + if (*ar->namewhat != '\0') /* is there a name? */ + lua_pushfstring(L, "function " LUA_QS, ar->name); + else if (*ar->what == 'm') /* main? */ + lua_pushliteral(L, "main chunk"); + else if (*ar->what == 'C') { + if (pushglobalfuncname(L, ar)) { + lua_pushfstring(L, "function " LUA_QS, lua_tostring(L, -1)); + lua_remove(L, -2); /* remove name */ + } + else + lua_pushliteral(L, "?"); + } + else + lua_pushfstring(L, "function <%s:%d>", ar->short_src, ar->linedefined); +} + + +static int countlevels (lua_State *L) { + lua_Debug ar; + int li = 1, le = 1; + /* find an upper bound */ + while (lua_getstack(L, le, &ar)) { li = le; le *= 2; } + /* do a binary search */ + while (li < le) { + int m = (li + le)/2; + if (lua_getstack(L, m, &ar)) li = m + 1; + else le = m; + } + return le - 1; +} + + +LUALIB_API void luaL_traceback (lua_State *L, lua_State *L1, + const char *msg, int level) { + lua_Debug ar; + int top = lua_gettop(L); + int numlevels = countlevels(L1); + int mark = (numlevels > LEVELS1 + LEVELS2) ? LEVELS1 : 0; + if (msg) lua_pushfstring(L, "%s\n", msg); + lua_pushliteral(L, "stack traceback:"); + while (lua_getstack(L1, level++, &ar)) { + if (level == mark) { /* too many levels? */ + lua_pushliteral(L, "\n\t..."); /* add a '...' */ + level = numlevels - LEVELS2; /* and skip to last ones */ + } + else { + lua_getinfo(L1, "Slnt", &ar); + lua_pushfstring(L, "\n\t%s:", ar.short_src); + if (ar.currentline > 0) + lua_pushfstring(L, "%d:", ar.currentline); + lua_pushliteral(L, " in "); + pushfuncname(L, &ar); + if (ar.istailcall) + lua_pushliteral(L, "\n\t(...tail calls...)"); + lua_concat(L, lua_gettop(L) - top); + } + } + lua_concat(L, lua_gettop(L) - top); +} + +/* }====================================================== */ + + +/* +** {====================================================== +** Error-report functions +** ======================================================= +*/ + +LUALIB_API int luaL_argerror (lua_State *L, int narg, const char *extramsg) { + lua_Debug ar; + if (!lua_getstack(L, 0, &ar)) /* no stack frame? */ + return luaL_error(L, "bad argument #%d (%s)", narg, extramsg); + lua_getinfo(L, "n", &ar); + if (strcmp(ar.namewhat, "method") == 0) { + narg--; /* do not count `self' */ + if (narg == 0) /* error is in the self argument itself? */ + return luaL_error(L, "calling " LUA_QS " on bad self (%s)", + ar.name, extramsg); + } + if (ar.name == NULL) + ar.name = (pushglobalfuncname(L, &ar)) ? lua_tostring(L, -1) : "?"; + return luaL_error(L, "bad argument #%d to " LUA_QS " (%s)", + narg, ar.name, extramsg); +} + + +static int typeerror (lua_State *L, int narg, const char *tname) { + const char *msg = lua_pushfstring(L, "%s expected, got %s", + tname, luaL_typename(L, narg)); + return luaL_argerror(L, narg, msg); +} + + +static void tag_error (lua_State *L, int narg, int tag) { + typeerror(L, narg, lua_typename(L, tag)); +} + + +LUALIB_API void luaL_where (lua_State *L, int level) { + lua_Debug ar; + if (lua_getstack(L, level, &ar)) { /* check function at level */ + lua_getinfo(L, "Sl", &ar); /* get info about it */ + if (ar.currentline > 0) { /* is there info? */ + lua_pushfstring(L, "%s:%d: ", ar.short_src, ar.currentline); + return; + } + } + lua_pushliteral(L, ""); /* else, no information available... */ +} + + +LUALIB_API int luaL_error (lua_State *L, const char *fmt, ...) { + va_list argp; + va_start(argp, fmt); + luaL_where(L, 1); + lua_pushvfstring(L, fmt, argp); + va_end(argp); + lua_concat(L, 2); + return lua_error(L); +} + + +LUALIB_API int luaL_fileresult (lua_State *L, int stat, const char *fname) { + int en = errno; /* calls to Lua API may change this value */ + if (stat) { + lua_pushboolean(L, 1); + return 1; + } + else { + lua_pushnil(L); + if (fname) + lua_pushfstring(L, "%s: %s", fname, strerror(en)); + else + lua_pushstring(L, strerror(en)); + lua_pushinteger(L, en); + return 3; + } +} + + +#if !defined(inspectstat) /* { */ + +#if defined(LUA_USE_POSIX) + +#include + +/* +** use appropriate macros to interpret 'pclose' return status +*/ +#define inspectstat(stat,what) \ + if (WIFEXITED(stat)) { stat = WEXITSTATUS(stat); } \ + else if (WIFSIGNALED(stat)) { stat = WTERMSIG(stat); what = "signal"; } + +#else + +#define inspectstat(stat,what) /* no op */ + +#endif + +#endif /* } */ + + +LUALIB_API int luaL_execresult (lua_State *L, int stat) { + const char *what = "exit"; /* type of termination */ + if (stat == -1) /* error? */ + return luaL_fileresult(L, 0, NULL); + else { + inspectstat(stat, what); /* interpret result */ + if (*what == 'e' && stat == 0) /* successful termination? */ + lua_pushboolean(L, 1); + else + lua_pushnil(L); + lua_pushstring(L, what); + lua_pushinteger(L, stat); + return 3; /* return true/nil,what,code */ + } +} + +/* }====================================================== */ + + +/* +** {====================================================== +** Userdata's metatable manipulation +** ======================================================= +*/ + +LUALIB_API int luaL_newmetatable (lua_State *L, const char *tname) { + luaL_getmetatable(L, tname); /* try to get metatable */ + if (!lua_isnil(L, -1)) /* name already in use? */ + return 0; /* leave previous value on top, but return 0 */ + lua_pop(L, 1); + lua_newtable(L); /* create metatable */ + lua_pushvalue(L, -1); + lua_setfield(L, LUA_REGISTRYINDEX, tname); /* registry.name = metatable */ + return 1; +} + + +LUALIB_API void luaL_setmetatable (lua_State *L, const char *tname) { + luaL_getmetatable(L, tname); + lua_setmetatable(L, -2); +} + + +LUALIB_API void *luaL_testudata (lua_State *L, int ud, const char *tname) { + void *p = lua_touserdata(L, ud); + if (p != NULL) { /* value is a userdata? */ + if (lua_getmetatable(L, ud)) { /* does it have a metatable? */ + luaL_getmetatable(L, tname); /* get correct metatable */ + if (!lua_rawequal(L, -1, -2)) /* not the same? */ + p = NULL; /* value is a userdata with wrong metatable */ + lua_pop(L, 2); /* remove both metatables */ + return p; + } + } + return NULL; /* value is not a userdata with a metatable */ +} + + +LUALIB_API void *luaL_checkudata (lua_State *L, int ud, const char *tname) { + void *p = luaL_testudata(L, ud, tname); + if (p == NULL) typeerror(L, ud, tname); + return p; +} + +/* }====================================================== */ + + +/* +** {====================================================== +** Argument check functions +** ======================================================= +*/ + +LUALIB_API int luaL_checkoption (lua_State *L, int narg, const char *def, + const char *const lst[]) { + const char *name = (def) ? luaL_optstring(L, narg, def) : + luaL_checkstring(L, narg); + int i; + for (i=0; lst[i]; i++) + if (strcmp(lst[i], name) == 0) + return i; + return luaL_argerror(L, narg, + lua_pushfstring(L, "invalid option " LUA_QS, name)); +} + + +LUALIB_API void luaL_checkstack (lua_State *L, int space, const char *msg) { + /* keep some extra space to run error routines, if needed */ + const int extra = LUA_MINSTACK; + if (!lua_checkstack(L, space + extra)) { + if (msg) + luaL_error(L, "stack overflow (%s)", msg); + else + luaL_error(L, "stack overflow"); + } +} + + +LUALIB_API void luaL_checktype (lua_State *L, int narg, int t) { + if (lua_type(L, narg) != t) + tag_error(L, narg, t); +} + + +LUALIB_API void luaL_checkany (lua_State *L, int narg) { + if (lua_type(L, narg) == LUA_TNONE) + luaL_argerror(L, narg, "value expected"); +} + + +LUALIB_API const char *luaL_checklstring (lua_State *L, int narg, size_t *len) { + const char *s = lua_tolstring(L, narg, len); + if (!s) tag_error(L, narg, LUA_TSTRING); + return s; +} + + +LUALIB_API const char *luaL_optlstring (lua_State *L, int narg, + const char *def, size_t *len) { + if (lua_isnoneornil(L, narg)) { + if (len) + *len = (def ? strlen(def) : 0); + return def; + } + else return luaL_checklstring(L, narg, len); +} + + +LUALIB_API lua_Number luaL_checknumber (lua_State *L, int narg) { + int isnum; + lua_Number d = lua_tonumberx(L, narg, &isnum); + if (!isnum) + tag_error(L, narg, LUA_TNUMBER); + return d; +} + + +LUALIB_API lua_Number luaL_optnumber (lua_State *L, int narg, lua_Number def) { + return luaL_opt(L, luaL_checknumber, narg, def); +} + + +LUALIB_API lua_Integer luaL_checkinteger (lua_State *L, int narg) { + int isnum; + lua_Integer d = lua_tointegerx(L, narg, &isnum); + if (!isnum) + tag_error(L, narg, LUA_TNUMBER); + return d; +} + + +LUALIB_API lua_Unsigned luaL_checkunsigned (lua_State *L, int narg) { + int isnum; + lua_Unsigned d = lua_tounsignedx(L, narg, &isnum); + if (!isnum) + tag_error(L, narg, LUA_TNUMBER); + return d; +} + + +LUALIB_API lua_Integer luaL_optinteger (lua_State *L, int narg, + lua_Integer def) { + return luaL_opt(L, luaL_checkinteger, narg, def); +} + + +LUALIB_API lua_Unsigned luaL_optunsigned (lua_State *L, int narg, + lua_Unsigned def) { + return luaL_opt(L, luaL_checkunsigned, narg, def); +} + +/* }====================================================== */ + + +/* +** {====================================================== +** Generic Buffer manipulation +** ======================================================= +*/ + +/* +** check whether buffer is using a userdata on the stack as a temporary +** buffer +*/ +#define buffonstack(B) ((B)->b != (B)->initb) + + +/* +** returns a pointer to a free area with at least 'sz' bytes +*/ +LUALIB_API char *luaL_prepbuffsize (luaL_Buffer *B, size_t sz) { + lua_State *L = B->L; + if (B->size - B->n < sz) { /* not enough space? */ + char *newbuff; + size_t newsize = B->size * 2; /* double buffer size */ + if (newsize - B->n < sz) /* not big enough? */ + newsize = B->n + sz; + if (newsize < B->n || newsize - B->n < sz) + luaL_error(L, "buffer too large"); + /* create larger buffer */ + newbuff = (char *)lua_newuserdata(L, newsize * sizeof(char)); + /* move content to new buffer */ + memcpy(newbuff, B->b, B->n * sizeof(char)); + if (buffonstack(B)) + lua_remove(L, -2); /* remove old buffer */ + B->b = newbuff; + B->size = newsize; + } + return &B->b[B->n]; +} + + +LUALIB_API void luaL_addlstring (luaL_Buffer *B, const char *s, size_t l) { + char *b = luaL_prepbuffsize(B, l); + memcpy(b, s, l * sizeof(char)); + luaL_addsize(B, l); +} + + +LUALIB_API void luaL_addstring (luaL_Buffer *B, const char *s) { + luaL_addlstring(B, s, strlen(s)); +} + + +LUALIB_API void luaL_pushresult (luaL_Buffer *B) { + lua_State *L = B->L; + lua_pushlstring(L, B->b, B->n); + if (buffonstack(B)) + lua_remove(L, -2); /* remove old buffer */ +} + + +LUALIB_API void luaL_pushresultsize (luaL_Buffer *B, size_t sz) { + luaL_addsize(B, sz); + luaL_pushresult(B); +} + + +LUALIB_API void luaL_addvalue (luaL_Buffer *B) { + lua_State *L = B->L; + size_t l; + const char *s = lua_tolstring(L, -1, &l); + if (buffonstack(B)) + lua_insert(L, -2); /* put value below buffer */ + luaL_addlstring(B, s, l); + lua_remove(L, (buffonstack(B)) ? -2 : -1); /* remove value */ +} + + +LUALIB_API void luaL_buffinit (lua_State *L, luaL_Buffer *B) { + B->L = L; + B->b = B->initb; + B->n = 0; + B->size = LUAL_BUFFERSIZE; +} + + +LUALIB_API char *luaL_buffinitsize (lua_State *L, luaL_Buffer *B, size_t sz) { + luaL_buffinit(L, B); + return luaL_prepbuffsize(B, sz); +} + +/* }====================================================== */ + + +/* +** {====================================================== +** Reference system +** ======================================================= +*/ + +/* index of free-list header */ +#define freelist 0 + + +LUALIB_API int luaL_ref (lua_State *L, int t) { + int ref; + if (lua_isnil(L, -1)) { + lua_pop(L, 1); /* remove from stack */ + return LUA_REFNIL; /* `nil' has a unique fixed reference */ + } + t = lua_absindex(L, t); + lua_rawgeti(L, t, freelist); /* get first free element */ + ref = (int)lua_tointeger(L, -1); /* ref = t[freelist] */ + lua_pop(L, 1); /* remove it from stack */ + if (ref != 0) { /* any free element? */ + lua_rawgeti(L, t, ref); /* remove it from list */ + lua_rawseti(L, t, freelist); /* (t[freelist] = t[ref]) */ + } + else /* no free elements */ + ref = (int)lua_rawlen(L, t) + 1; /* get a new reference */ + lua_rawseti(L, t, ref); + return ref; +} + + +LUALIB_API void luaL_unref (lua_State *L, int t, int ref) { + if (ref >= 0) { + t = lua_absindex(L, t); + lua_rawgeti(L, t, freelist); + lua_rawseti(L, t, ref); /* t[ref] = t[freelist] */ + lua_pushinteger(L, ref); + lua_rawseti(L, t, freelist); /* t[freelist] = ref */ + } +} + +/* }====================================================== */ + + +/* +** {====================================================== +** Load functions +** ======================================================= +*/ + +typedef struct LoadF { + int n; /* number of pre-read characters */ + FILE *f; /* file being read */ + char buff[LUAL_BUFFERSIZE]; /* area for reading file */ +} LoadF; + + +static const char *getF (lua_State *L, void *ud, size_t *size) { + LoadF *lf = (LoadF *)ud; + (void)L; /* not used */ + if (lf->n > 0) { /* are there pre-read characters to be read? */ + *size = lf->n; /* return them (chars already in buffer) */ + lf->n = 0; /* no more pre-read characters */ + } + else { /* read a block from file */ + /* 'fread' can return > 0 *and* set the EOF flag. If next call to + 'getF' called 'fread', it might still wait for user input. + The next check avoids this problem. */ + if (feof(lf->f)) return NULL; + *size = fread(lf->buff, 1, sizeof(lf->buff), lf->f); /* read block */ + } + return lf->buff; +} + + +static int errfile (lua_State *L, const char *what, int fnameindex) { + const char *serr = strerror(errno); + const char *filename = lua_tostring(L, fnameindex) + 1; + lua_pushfstring(L, "cannot %s %s: %s", what, filename, serr); + lua_remove(L, fnameindex); + return LUA_ERRFILE; +} + + +static int skipBOM (LoadF *lf) { + const char *p = "\xEF\xBB\xBF"; /* Utf8 BOM mark */ + int c; + lf->n = 0; + do { + c = getc(lf->f); + if (c == EOF || c != *(const unsigned char *)p++) return c; + lf->buff[lf->n++] = c; /* to be read by the parser */ + } while (*p != '\0'); + lf->n = 0; /* prefix matched; discard it */ + return getc(lf->f); /* return next character */ +} + + +/* +** reads the first character of file 'f' and skips an optional BOM mark +** in its beginning plus its first line if it starts with '#'. Returns +** true if it skipped the first line. In any case, '*cp' has the +** first "valid" character of the file (after the optional BOM and +** a first-line comment). +*/ +static int skipcomment (LoadF *lf, int *cp) { + int c = *cp = skipBOM(lf); + if (c == '#') { /* first line is a comment (Unix exec. file)? */ + do { /* skip first line */ + c = getc(lf->f); + } while (c != EOF && c != '\n') ; + *cp = getc(lf->f); /* skip end-of-line, if present */ + return 1; /* there was a comment */ + } + else return 0; /* no comment */ +} + + +LUALIB_API int luaL_loadfilex (lua_State *L, const char *filename, + const char *mode) { + LoadF lf; + int status, readstatus; + int c; + int fnameindex = lua_gettop(L) + 1; /* index of filename on the stack */ + if (filename == NULL) { + lua_pushliteral(L, "=stdin"); + lf.f = stdin; + } + else { + lua_pushfstring(L, "@%s", filename); + lf.f = fopen(filename, "r"); + if (lf.f == NULL) return errfile(L, "open", fnameindex); + } + if (skipcomment(&lf, &c)) /* read initial portion */ + lf.buff[lf.n++] = '\n'; /* add line to correct line numbers */ + if (c == LUA_SIGNATURE[0] && filename) { /* binary file? */ + lf.f = freopen(filename, "rb", lf.f); /* reopen in binary mode */ + if (lf.f == NULL) return errfile(L, "reopen", fnameindex); + skipcomment(&lf, &c); /* re-read initial portion */ + } + if (c != EOF) + lf.buff[lf.n++] = c; /* 'c' is the first character of the stream */ + status = lua_load(L, getF, &lf, lua_tostring(L, -1), mode); + readstatus = ferror(lf.f); + if (filename) fclose(lf.f); /* close file (even in case of errors) */ + if (readstatus) { + lua_settop(L, fnameindex); /* ignore results from `lua_load' */ + return errfile(L, "read", fnameindex); + } + lua_remove(L, fnameindex); + return status; +} + + +typedef struct LoadS { + const char *s; + size_t size; +} LoadS; + + +static const char *getS (lua_State *L, void *ud, size_t *size) { + LoadS *ls = (LoadS *)ud; + (void)L; /* not used */ + if (ls->size == 0) return NULL; + *size = ls->size; + ls->size = 0; + return ls->s; +} + + +LUALIB_API int luaL_loadbufferx (lua_State *L, const char *buff, size_t size, + const char *name, const char *mode) { + LoadS ls; + ls.s = buff; + ls.size = size; + return lua_load(L, getS, &ls, name, mode); +} + + +LUALIB_API int luaL_loadstring (lua_State *L, const char *s) { + return luaL_loadbuffer(L, s, strlen(s), s); +} + +/* }====================================================== */ + + + +LUALIB_API int luaL_getmetafield (lua_State *L, int obj, const char *event) { + if (!lua_getmetatable(L, obj)) /* no metatable? */ + return 0; + lua_pushstring(L, event); + lua_rawget(L, -2); + if (lua_isnil(L, -1)) { + lua_pop(L, 2); /* remove metatable and metafield */ + return 0; + } + else { + lua_remove(L, -2); /* remove only metatable */ + return 1; + } +} + + +LUALIB_API int luaL_callmeta (lua_State *L, int obj, const char *event) { + obj = lua_absindex(L, obj); + if (!luaL_getmetafield(L, obj, event)) /* no metafield? */ + return 0; + lua_pushvalue(L, obj); + lua_call(L, 1, 1); + return 1; +} + + +LUALIB_API int luaL_len (lua_State *L, int idx) { + int l; + int isnum; + lua_len(L, idx); + l = (int)lua_tointegerx(L, -1, &isnum); + if (!isnum) + luaL_error(L, "object length is not a number"); + lua_pop(L, 1); /* remove object */ + return l; +} + + +LUALIB_API const char *luaL_tolstring (lua_State *L, int idx, size_t *len) { + if (!luaL_callmeta(L, idx, "__tostring")) { /* no metafield? */ + switch (lua_type(L, idx)) { + case LUA_TNUMBER: + case LUA_TSTRING: + lua_pushvalue(L, idx); + break; + case LUA_TBOOLEAN: + lua_pushstring(L, (lua_toboolean(L, idx) ? "true" : "false")); + break; + case LUA_TNIL: + lua_pushliteral(L, "nil"); + break; + default: + lua_pushfstring(L, "%s: %p", luaL_typename(L, idx), + lua_topointer(L, idx)); + break; + } + } + return lua_tolstring(L, -1, len); +} + + +/* +** {====================================================== +** Compatibility with 5.1 module functions +** ======================================================= +*/ +#if defined(LUA_COMPAT_MODULE) + +static const char *luaL_findtable (lua_State *L, int idx, + const char *fname, int szhint) { + const char *e; + if (idx) lua_pushvalue(L, idx); + do { + e = strchr(fname, '.'); + if (e == NULL) e = fname + strlen(fname); + lua_pushlstring(L, fname, e - fname); + lua_rawget(L, -2); + if (lua_isnil(L, -1)) { /* no such field? */ + lua_pop(L, 1); /* remove this nil */ + lua_createtable(L, 0, (*e == '.' ? 1 : szhint)); /* new table for field */ + lua_pushlstring(L, fname, e - fname); + lua_pushvalue(L, -2); + lua_settable(L, -4); /* set new table into field */ + } + else if (!lua_istable(L, -1)) { /* field has a non-table value? */ + lua_pop(L, 2); /* remove table and value */ + return fname; /* return problematic part of the name */ + } + lua_remove(L, -2); /* remove previous table */ + fname = e + 1; + } while (*e == '.'); + return NULL; +} + + +/* +** Count number of elements in a luaL_Reg list. +*/ +static int libsize (const luaL_Reg *l) { + int size = 0; + for (; l && l->name; l++) size++; + return size; +} + + +/* +** Find or create a module table with a given name. The function +** first looks at the _LOADED table and, if that fails, try a +** global variable with that name. In any case, leaves on the stack +** the module table. +*/ +LUALIB_API void luaL_pushmodule (lua_State *L, const char *modname, + int sizehint) { + luaL_findtable(L, LUA_REGISTRYINDEX, "_LOADED", 1); /* get _LOADED table */ + lua_getfield(L, -1, modname); /* get _LOADED[modname] */ + if (!lua_istable(L, -1)) { /* not found? */ + lua_pop(L, 1); /* remove previous result */ + /* try global variable (and create one if it does not exist) */ + lua_pushglobaltable(L); + if (luaL_findtable(L, 0, modname, sizehint) != NULL) + luaL_error(L, "name conflict for module " LUA_QS, modname); + lua_pushvalue(L, -1); + lua_setfield(L, -3, modname); /* _LOADED[modname] = new table */ + } + lua_remove(L, -2); /* remove _LOADED table */ +} + + +LUALIB_API void luaL_openlib (lua_State *L, const char *libname, + const luaL_Reg *l, int nup) { + luaL_checkversion(L); + if (libname) { + luaL_pushmodule(L, libname, libsize(l)); /* get/create library table */ + lua_insert(L, -(nup + 1)); /* move library table to below upvalues */ + } + if (l) + luaL_setfuncs(L, l, nup); + else + lua_pop(L, nup); /* remove upvalues */ +} + +#endif +/* }====================================================== */ + +/* +** set functions from list 'l' into table at top - 'nup'; each +** function gets the 'nup' elements at the top as upvalues. +** Returns with only the table at the stack. +*/ +LUALIB_API void luaL_setfuncs (lua_State *L, const luaL_Reg *l, int nup) { + luaL_checkversion(L); + luaL_checkstack(L, nup, "too many upvalues"); + for (; l->name != NULL; l++) { /* fill the table with given functions */ + int i; + for (i = 0; i < nup; i++) /* copy upvalues to the top */ + lua_pushvalue(L, -nup); + lua_pushcclosure(L, l->func, nup); /* closure with those upvalues */ + lua_setfield(L, -(nup + 2), l->name); + } + lua_pop(L, nup); /* remove upvalues */ +} + + +/* +** ensure that stack[idx][fname] has a table and push that table +** into the stack +*/ +LUALIB_API int luaL_getsubtable (lua_State *L, int idx, const char *fname) { + lua_getfield(L, idx, fname); + if (lua_istable(L, -1)) return 1; /* table already there */ + else { + lua_pop(L, 1); /* remove previous result */ + idx = lua_absindex(L, idx); + lua_newtable(L); + lua_pushvalue(L, -1); /* copy to be left at top */ + lua_setfield(L, idx, fname); /* assign new table to field */ + return 0; /* false, because did not find table there */ + } +} + + +/* +** stripped-down 'require'. Calls 'openf' to open a module, +** registers the result in 'package.loaded' table and, if 'glb' +** is true, also registers the result in the global table. +** Leaves resulting module on the top. +*/ +LUALIB_API void luaL_requiref (lua_State *L, const char *modname, + lua_CFunction openf, int glb) { + lua_pushcfunction(L, openf); + lua_pushstring(L, modname); /* argument to open function */ + lua_call(L, 1, 1); /* open module */ + luaL_getsubtable(L, LUA_REGISTRYINDEX, "_LOADED"); + lua_pushvalue(L, -2); /* make copy of module (call result) */ + lua_setfield(L, -2, modname); /* _LOADED[modname] = module */ + lua_pop(L, 1); /* remove _LOADED table */ + if (glb) { + lua_pushvalue(L, -1); /* copy of 'mod' */ + lua_setglobal(L, modname); /* _G[modname] = module */ + } +} + + +LUALIB_API const char *luaL_gsub (lua_State *L, const char *s, const char *p, + const char *r) { + const char *wild; + size_t l = strlen(p); + luaL_Buffer b; + luaL_buffinit(L, &b); + while ((wild = strstr(s, p)) != NULL) { + luaL_addlstring(&b, s, wild - s); /* push prefix */ + luaL_addstring(&b, r); /* push replacement in place of pattern */ + s = wild + l; /* continue after `p' */ + } + luaL_addstring(&b, s); /* push last suffix */ + luaL_pushresult(&b); + return lua_tostring(L, -1); +} + + +static void *l_alloc (void *ud, void *ptr, size_t osize, size_t nsize) { + (void)ud; (void)osize; /* not used */ + if (nsize == 0) { + free(ptr); + return NULL; + } + else + return realloc(ptr, nsize); +} + + +static int panic (lua_State *L) { + luai_writestringerror("PANIC: unprotected error in call to Lua API (%s)\n", + lua_tostring(L, -1)); + return 0; /* return to Lua to abort */ +} + + +LUALIB_API lua_State *luaL_newstate (void) { + lua_State *L = lua_newstate(l_alloc, NULL); + if (L) lua_atpanic(L, &panic); + return L; +} + + +LUALIB_API void luaL_checkversion_ (lua_State *L, lua_Number ver) { + const lua_Number *v = lua_version(L); + if (v != lua_version(NULL)) + luaL_error(L, "multiple Lua VMs detected"); + else if (*v != ver) + luaL_error(L, "version mismatch: app. needs %f, Lua core provides %f", + ver, *v); + /* check conversions number -> integer types */ + lua_pushnumber(L, -(lua_Number)0x1234); + if (lua_tointeger(L, -1) != -0x1234 || + lua_tounsigned(L, -1) != (lua_Unsigned)-0x1234) + luaL_error(L, "bad conversion number->int;" + " must recompile Lua with proper settings"); + lua_pop(L, 1); +} + diff --git a/ext/lua/src/lbaselib.c b/ext/lua/src/lbaselib.c new file mode 100644 index 000000000..540e9a5cc --- /dev/null +++ b/ext/lua/src/lbaselib.c @@ -0,0 +1,458 @@ +/* +** $Id: lbaselib.c,v 1.276 2013/02/21 13:44:53 roberto Exp $ +** Basic library +** See Copyright Notice in lua.h +*/ + + + +#include +#include +#include +#include + +#define lbaselib_c +#define LUA_LIB + +#include "lua.h" + +#include "lauxlib.h" +#include "lualib.h" + + +static int luaB_print (lua_State *L) { + int n = lua_gettop(L); /* number of arguments */ + int i; + lua_getglobal(L, "tostring"); + for (i=1; i<=n; i++) { + const char *s; + size_t l; + lua_pushvalue(L, -1); /* function to be called */ + lua_pushvalue(L, i); /* value to print */ + lua_call(L, 1, 1); + s = lua_tolstring(L, -1, &l); /* get result */ + if (s == NULL) + return luaL_error(L, + LUA_QL("tostring") " must return a string to " LUA_QL("print")); + if (i>1) luai_writestring("\t", 1); + luai_writestring(s, l); + lua_pop(L, 1); /* pop result */ + } + luai_writeline(); + return 0; +} + + +#define SPACECHARS " \f\n\r\t\v" + +static int luaB_tonumber (lua_State *L) { + if (lua_isnoneornil(L, 2)) { /* standard conversion */ + int isnum; + lua_Number n = lua_tonumberx(L, 1, &isnum); + if (isnum) { + lua_pushnumber(L, n); + return 1; + } /* else not a number; must be something */ + luaL_checkany(L, 1); + } + else { + size_t l; + const char *s = luaL_checklstring(L, 1, &l); + const char *e = s + l; /* end point for 's' */ + int base = luaL_checkint(L, 2); + int neg = 0; + luaL_argcheck(L, 2 <= base && base <= 36, 2, "base out of range"); + s += strspn(s, SPACECHARS); /* skip initial spaces */ + if (*s == '-') { s++; neg = 1; } /* handle signal */ + else if (*s == '+') s++; + if (isalnum((unsigned char)*s)) { + lua_Number n = 0; + do { + int digit = (isdigit((unsigned char)*s)) ? *s - '0' + : toupper((unsigned char)*s) - 'A' + 10; + if (digit >= base) break; /* invalid numeral; force a fail */ + n = n * (lua_Number)base + (lua_Number)digit; + s++; + } while (isalnum((unsigned char)*s)); + s += strspn(s, SPACECHARS); /* skip trailing spaces */ + if (s == e) { /* no invalid trailing characters? */ + lua_pushnumber(L, (neg) ? -n : n); + return 1; + } /* else not a number */ + } /* else not a number */ + } + lua_pushnil(L); /* not a number */ + return 1; +} + + +static int luaB_error (lua_State *L) { + int level = luaL_optint(L, 2, 1); + lua_settop(L, 1); + if (lua_isstring(L, 1) && level > 0) { /* add extra information? */ + luaL_where(L, level); + lua_pushvalue(L, 1); + lua_concat(L, 2); + } + return lua_error(L); +} + + +static int luaB_getmetatable (lua_State *L) { + luaL_checkany(L, 1); + if (!lua_getmetatable(L, 1)) { + lua_pushnil(L); + return 1; /* no metatable */ + } + luaL_getmetafield(L, 1, "__metatable"); + return 1; /* returns either __metatable field (if present) or metatable */ +} + + +static int luaB_setmetatable (lua_State *L) { + int t = lua_type(L, 2); + luaL_checktype(L, 1, LUA_TTABLE); + luaL_argcheck(L, t == LUA_TNIL || t == LUA_TTABLE, 2, + "nil or table expected"); + if (luaL_getmetafield(L, 1, "__metatable")) + return luaL_error(L, "cannot change a protected metatable"); + lua_settop(L, 2); + lua_setmetatable(L, 1); + return 1; +} + + +static int luaB_rawequal (lua_State *L) { + luaL_checkany(L, 1); + luaL_checkany(L, 2); + lua_pushboolean(L, lua_rawequal(L, 1, 2)); + return 1; +} + + +static int luaB_rawlen (lua_State *L) { + int t = lua_type(L, 1); + luaL_argcheck(L, t == LUA_TTABLE || t == LUA_TSTRING, 1, + "table or string expected"); + lua_pushinteger(L, lua_rawlen(L, 1)); + return 1; +} + + +static int luaB_rawget (lua_State *L) { + luaL_checktype(L, 1, LUA_TTABLE); + luaL_checkany(L, 2); + lua_settop(L, 2); + lua_rawget(L, 1); + return 1; +} + +static int luaB_rawset (lua_State *L) { + luaL_checktype(L, 1, LUA_TTABLE); + luaL_checkany(L, 2); + luaL_checkany(L, 3); + lua_settop(L, 3); + lua_rawset(L, 1); + return 1; +} + + +static int luaB_collectgarbage (lua_State *L) { + static const char *const opts[] = {"stop", "restart", "collect", + "count", "step", "setpause", "setstepmul", + "setmajorinc", "isrunning", "generational", "incremental", NULL}; + static const int optsnum[] = {LUA_GCSTOP, LUA_GCRESTART, LUA_GCCOLLECT, + LUA_GCCOUNT, LUA_GCSTEP, LUA_GCSETPAUSE, LUA_GCSETSTEPMUL, + LUA_GCSETMAJORINC, LUA_GCISRUNNING, LUA_GCGEN, LUA_GCINC}; + int o = optsnum[luaL_checkoption(L, 1, "collect", opts)]; + int ex = luaL_optint(L, 2, 0); + int res = lua_gc(L, o, ex); + switch (o) { + case LUA_GCCOUNT: { + int b = lua_gc(L, LUA_GCCOUNTB, 0); + lua_pushnumber(L, res + ((lua_Number)b/1024)); + lua_pushinteger(L, b); + return 2; + } + case LUA_GCSTEP: case LUA_GCISRUNNING: { + lua_pushboolean(L, res); + return 1; + } + default: { + lua_pushinteger(L, res); + return 1; + } + } +} + + +static int luaB_type (lua_State *L) { + luaL_checkany(L, 1); + lua_pushstring(L, luaL_typename(L, 1)); + return 1; +} + + +static int pairsmeta (lua_State *L, const char *method, int iszero, + lua_CFunction iter) { + if (!luaL_getmetafield(L, 1, method)) { /* no metamethod? */ + luaL_checktype(L, 1, LUA_TTABLE); /* argument must be a table */ + lua_pushcfunction(L, iter); /* will return generator, */ + lua_pushvalue(L, 1); /* state, */ + if (iszero) lua_pushinteger(L, 0); /* and initial value */ + else lua_pushnil(L); + } + else { + lua_pushvalue(L, 1); /* argument 'self' to metamethod */ + lua_call(L, 1, 3); /* get 3 values from metamethod */ + } + return 3; +} + + +static int luaB_next (lua_State *L) { + luaL_checktype(L, 1, LUA_TTABLE); + lua_settop(L, 2); /* create a 2nd argument if there isn't one */ + if (lua_next(L, 1)) + return 2; + else { + lua_pushnil(L); + return 1; + } +} + + +static int luaB_pairs (lua_State *L) { + return pairsmeta(L, "__pairs", 0, luaB_next); +} + + +static int ipairsaux (lua_State *L) { + int i = luaL_checkint(L, 2); + luaL_checktype(L, 1, LUA_TTABLE); + i++; /* next value */ + lua_pushinteger(L, i); + lua_rawgeti(L, 1, i); + return (lua_isnil(L, -1)) ? 1 : 2; +} + + +static int luaB_ipairs (lua_State *L) { + return pairsmeta(L, "__ipairs", 1, ipairsaux); +} + + +static int load_aux (lua_State *L, int status, int envidx) { + if (status == LUA_OK) { + if (envidx != 0) { /* 'env' parameter? */ + lua_pushvalue(L, envidx); /* environment for loaded function */ + if (!lua_setupvalue(L, -2, 1)) /* set it as 1st upvalue */ + lua_pop(L, 1); /* remove 'env' if not used by previous call */ + } + return 1; + } + else { /* error (message is on top of the stack) */ + lua_pushnil(L); + lua_insert(L, -2); /* put before error message */ + return 2; /* return nil plus error message */ + } +} + + +static int luaB_loadfile (lua_State *L) { + const char *fname = luaL_optstring(L, 1, NULL); + const char *mode = luaL_optstring(L, 2, NULL); + int env = (!lua_isnone(L, 3) ? 3 : 0); /* 'env' index or 0 if no 'env' */ + int status = luaL_loadfilex(L, fname, mode); + return load_aux(L, status, env); +} + + +/* +** {====================================================== +** Generic Read function +** ======================================================= +*/ + + +/* +** reserved slot, above all arguments, to hold a copy of the returned +** string to avoid it being collected while parsed. 'load' has four +** optional arguments (chunk, source name, mode, and environment). +*/ +#define RESERVEDSLOT 5 + + +/* +** Reader for generic `load' function: `lua_load' uses the +** stack for internal stuff, so the reader cannot change the +** stack top. Instead, it keeps its resulting string in a +** reserved slot inside the stack. +*/ +static const char *generic_reader (lua_State *L, void *ud, size_t *size) { + (void)(ud); /* not used */ + luaL_checkstack(L, 2, "too many nested functions"); + lua_pushvalue(L, 1); /* get function */ + lua_call(L, 0, 1); /* call it */ + if (lua_isnil(L, -1)) { + lua_pop(L, 1); /* pop result */ + *size = 0; + return NULL; + } + else if (!lua_isstring(L, -1)) + luaL_error(L, "reader function must return a string"); + lua_replace(L, RESERVEDSLOT); /* save string in reserved slot */ + return lua_tolstring(L, RESERVEDSLOT, size); +} + + +static int luaB_load (lua_State *L) { + int status; + size_t l; + const char *s = lua_tolstring(L, 1, &l); + const char *mode = luaL_optstring(L, 3, "bt"); + int env = (!lua_isnone(L, 4) ? 4 : 0); /* 'env' index or 0 if no 'env' */ + if (s != NULL) { /* loading a string? */ + const char *chunkname = luaL_optstring(L, 2, s); + status = luaL_loadbufferx(L, s, l, chunkname, mode); + } + else { /* loading from a reader function */ + const char *chunkname = luaL_optstring(L, 2, "=(load)"); + luaL_checktype(L, 1, LUA_TFUNCTION); + lua_settop(L, RESERVEDSLOT); /* create reserved slot */ + status = lua_load(L, generic_reader, NULL, chunkname, mode); + } + return load_aux(L, status, env); +} + +/* }====================================================== */ + + +static int dofilecont (lua_State *L) { + return lua_gettop(L) - 1; +} + + +static int luaB_dofile (lua_State *L) { + const char *fname = luaL_optstring(L, 1, NULL); + lua_settop(L, 1); + if (luaL_loadfile(L, fname) != LUA_OK) + return lua_error(L); + lua_callk(L, 0, LUA_MULTRET, 0, dofilecont); + return dofilecont(L); +} + + +static int luaB_assert (lua_State *L) { + if (!lua_toboolean(L, 1)) + return luaL_error(L, "%s", luaL_optstring(L, 2, "assertion failed!")); + return lua_gettop(L); +} + + +static int luaB_select (lua_State *L) { + int n = lua_gettop(L); + if (lua_type(L, 1) == LUA_TSTRING && *lua_tostring(L, 1) == '#') { + lua_pushinteger(L, n-1); + return 1; + } + else { + int i = luaL_checkint(L, 1); + if (i < 0) i = n + i; + else if (i > n) i = n; + luaL_argcheck(L, 1 <= i, 1, "index out of range"); + return n - i; + } +} + + +static int finishpcall (lua_State *L, int status) { + if (!lua_checkstack(L, 1)) { /* no space for extra boolean? */ + lua_settop(L, 0); /* create space for return values */ + lua_pushboolean(L, 0); + lua_pushstring(L, "stack overflow"); + return 2; /* return false, msg */ + } + lua_pushboolean(L, status); /* first result (status) */ + lua_replace(L, 1); /* put first result in first slot */ + return lua_gettop(L); +} + + +static int pcallcont (lua_State *L) { + int status = lua_getctx(L, NULL); + return finishpcall(L, (status == LUA_YIELD)); +} + + +static int luaB_pcall (lua_State *L) { + int status; + luaL_checkany(L, 1); + lua_pushnil(L); + lua_insert(L, 1); /* create space for status result */ + status = lua_pcallk(L, lua_gettop(L) - 2, LUA_MULTRET, 0, 0, pcallcont); + return finishpcall(L, (status == LUA_OK)); +} + + +static int luaB_xpcall (lua_State *L) { + int status; + int n = lua_gettop(L); + luaL_argcheck(L, n >= 2, 2, "value expected"); + lua_pushvalue(L, 1); /* exchange function... */ + lua_copy(L, 2, 1); /* ...and error handler */ + lua_replace(L, 2); + status = lua_pcallk(L, n - 2, LUA_MULTRET, 1, 0, pcallcont); + return finishpcall(L, (status == LUA_OK)); +} + + +static int luaB_tostring (lua_State *L) { + luaL_checkany(L, 1); + luaL_tolstring(L, 1, NULL); + return 1; +} + + +static const luaL_Reg base_funcs[] = { + {"assert", luaB_assert}, + {"collectgarbage", luaB_collectgarbage}, + {"dofile", luaB_dofile}, + {"error", luaB_error}, + {"getmetatable", luaB_getmetatable}, + {"ipairs", luaB_ipairs}, + {"loadfile", luaB_loadfile}, + {"load", luaB_load}, +#if defined(LUA_COMPAT_LOADSTRING) + {"loadstring", luaB_load}, +#endif + {"next", luaB_next}, + {"pairs", luaB_pairs}, + {"pcall", luaB_pcall}, + {"print", luaB_print}, + {"rawequal", luaB_rawequal}, + {"rawlen", luaB_rawlen}, + {"rawget", luaB_rawget}, + {"rawset", luaB_rawset}, + {"select", luaB_select}, + {"setmetatable", luaB_setmetatable}, + {"tonumber", luaB_tonumber}, + {"tostring", luaB_tostring}, + {"type", luaB_type}, + {"xpcall", luaB_xpcall}, + {NULL, NULL} +}; + + +LUAMOD_API int luaopen_base (lua_State *L) { + /* set global _G */ + lua_pushglobaltable(L); + lua_pushglobaltable(L); + lua_setfield(L, -2, "_G"); + /* open lib into global table */ + luaL_setfuncs(L, base_funcs, 0); + lua_pushliteral(L, LUA_VERSION); + lua_setfield(L, -2, "_VERSION"); /* set global _VERSION */ + return 1; +} + diff --git a/ext/lua/src/lbitlib.c b/ext/lua/src/lbitlib.c new file mode 100644 index 000000000..9637532e3 --- /dev/null +++ b/ext/lua/src/lbitlib.c @@ -0,0 +1,211 @@ +/* +** $Id: lbitlib.c,v 1.18 2013/03/19 13:19:12 roberto Exp $ +** Standard library for bitwise operations +** See Copyright Notice in lua.h +*/ + +#define lbitlib_c +#define LUA_LIB + +#include "lua.h" + +#include "lauxlib.h" +#include "lualib.h" + + +/* number of bits to consider in a number */ +#if !defined(LUA_NBITS) +#define LUA_NBITS 32 +#endif + + +#define ALLONES (~(((~(lua_Unsigned)0) << (LUA_NBITS - 1)) << 1)) + +/* macro to trim extra bits */ +#define trim(x) ((x) & ALLONES) + + +/* builds a number with 'n' ones (1 <= n <= LUA_NBITS) */ +#define mask(n) (~((ALLONES << 1) << ((n) - 1))) + + +typedef lua_Unsigned b_uint; + + + +static b_uint andaux (lua_State *L) { + int i, n = lua_gettop(L); + b_uint r = ~(b_uint)0; + for (i = 1; i <= n; i++) + r &= luaL_checkunsigned(L, i); + return trim(r); +} + + +static int b_and (lua_State *L) { + b_uint r = andaux(L); + lua_pushunsigned(L, r); + return 1; +} + + +static int b_test (lua_State *L) { + b_uint r = andaux(L); + lua_pushboolean(L, r != 0); + return 1; +} + + +static int b_or (lua_State *L) { + int i, n = lua_gettop(L); + b_uint r = 0; + for (i = 1; i <= n; i++) + r |= luaL_checkunsigned(L, i); + lua_pushunsigned(L, trim(r)); + return 1; +} + + +static int b_xor (lua_State *L) { + int i, n = lua_gettop(L); + b_uint r = 0; + for (i = 1; i <= n; i++) + r ^= luaL_checkunsigned(L, i); + lua_pushunsigned(L, trim(r)); + return 1; +} + + +static int b_not (lua_State *L) { + b_uint r = ~luaL_checkunsigned(L, 1); + lua_pushunsigned(L, trim(r)); + return 1; +} + + +static int b_shift (lua_State *L, b_uint r, int i) { + if (i < 0) { /* shift right? */ + i = -i; + r = trim(r); + if (i >= LUA_NBITS) r = 0; + else r >>= i; + } + else { /* shift left */ + if (i >= LUA_NBITS) r = 0; + else r <<= i; + r = trim(r); + } + lua_pushunsigned(L, r); + return 1; +} + + +static int b_lshift (lua_State *L) { + return b_shift(L, luaL_checkunsigned(L, 1), luaL_checkint(L, 2)); +} + + +static int b_rshift (lua_State *L) { + return b_shift(L, luaL_checkunsigned(L, 1), -luaL_checkint(L, 2)); +} + + +static int b_arshift (lua_State *L) { + b_uint r = luaL_checkunsigned(L, 1); + int i = luaL_checkint(L, 2); + if (i < 0 || !(r & ((b_uint)1 << (LUA_NBITS - 1)))) + return b_shift(L, r, -i); + else { /* arithmetic shift for 'negative' number */ + if (i >= LUA_NBITS) r = ALLONES; + else + r = trim((r >> i) | ~(~(b_uint)0 >> i)); /* add signal bit */ + lua_pushunsigned(L, r); + return 1; + } +} + + +static int b_rot (lua_State *L, int i) { + b_uint r = luaL_checkunsigned(L, 1); + i &= (LUA_NBITS - 1); /* i = i % NBITS */ + r = trim(r); + r = (r << i) | (r >> (LUA_NBITS - i)); + lua_pushunsigned(L, trim(r)); + return 1; +} + + +static int b_lrot (lua_State *L) { + return b_rot(L, luaL_checkint(L, 2)); +} + + +static int b_rrot (lua_State *L) { + return b_rot(L, -luaL_checkint(L, 2)); +} + + +/* +** get field and width arguments for field-manipulation functions, +** checking whether they are valid. +** ('luaL_error' called without 'return' to avoid later warnings about +** 'width' being used uninitialized.) +*/ +static int fieldargs (lua_State *L, int farg, int *width) { + int f = luaL_checkint(L, farg); + int w = luaL_optint(L, farg + 1, 1); + luaL_argcheck(L, 0 <= f, farg, "field cannot be negative"); + luaL_argcheck(L, 0 < w, farg + 1, "width must be positive"); + if (f + w > LUA_NBITS) + luaL_error(L, "trying to access non-existent bits"); + *width = w; + return f; +} + + +static int b_extract (lua_State *L) { + int w; + b_uint r = luaL_checkunsigned(L, 1); + int f = fieldargs(L, 2, &w); + r = (r >> f) & mask(w); + lua_pushunsigned(L, r); + return 1; +} + + +static int b_replace (lua_State *L) { + int w; + b_uint r = luaL_checkunsigned(L, 1); + b_uint v = luaL_checkunsigned(L, 2); + int f = fieldargs(L, 3, &w); + int m = mask(w); + v &= m; /* erase bits outside given width */ + r = (r & ~(m << f)) | (v << f); + lua_pushunsigned(L, r); + return 1; +} + + +static const luaL_Reg bitlib[] = { + {"arshift", b_arshift}, + {"band", b_and}, + {"bnot", b_not}, + {"bor", b_or}, + {"bxor", b_xor}, + {"btest", b_test}, + {"extract", b_extract}, + {"lrotate", b_lrot}, + {"lshift", b_lshift}, + {"replace", b_replace}, + {"rrotate", b_rrot}, + {"rshift", b_rshift}, + {NULL, NULL} +}; + + + +LUAMOD_API int luaopen_bit32 (lua_State *L) { + luaL_newlib(L, bitlib); + return 1; +} + diff --git a/ext/lua/src/lcode.c b/ext/lua/src/lcode.c new file mode 100644 index 000000000..56c26ac8a --- /dev/null +++ b/ext/lua/src/lcode.c @@ -0,0 +1,881 @@ +/* +** $Id: lcode.c,v 2.62 2012/08/16 17:34:28 roberto Exp $ +** Code generator for Lua +** See Copyright Notice in lua.h +*/ + + +#include + +#define lcode_c +#define LUA_CORE + +#include "lua.h" + +#include "lcode.h" +#include "ldebug.h" +#include "ldo.h" +#include "lgc.h" +#include "llex.h" +#include "lmem.h" +#include "lobject.h" +#include "lopcodes.h" +#include "lparser.h" +#include "lstring.h" +#include "ltable.h" +#include "lvm.h" + + +#define hasjumps(e) ((e)->t != (e)->f) + + +static int isnumeral(expdesc *e) { + return (e->k == VKNUM && e->t == NO_JUMP && e->f == NO_JUMP); +} + + +void luaK_nil (FuncState *fs, int from, int n) { + Instruction *previous; + int l = from + n - 1; /* last register to set nil */ + if (fs->pc > fs->lasttarget) { /* no jumps to current position? */ + previous = &fs->f->code[fs->pc-1]; + if (GET_OPCODE(*previous) == OP_LOADNIL) { + int pfrom = GETARG_A(*previous); + int pl = pfrom + GETARG_B(*previous); + if ((pfrom <= from && from <= pl + 1) || + (from <= pfrom && pfrom <= l + 1)) { /* can connect both? */ + if (pfrom < from) from = pfrom; /* from = min(from, pfrom) */ + if (pl > l) l = pl; /* l = max(l, pl) */ + SETARG_A(*previous, from); + SETARG_B(*previous, l - from); + return; + } + } /* else go through */ + } + luaK_codeABC(fs, OP_LOADNIL, from, n - 1, 0); /* else no optimization */ +} + + +int luaK_jump (FuncState *fs) { + int jpc = fs->jpc; /* save list of jumps to here */ + int j; + fs->jpc = NO_JUMP; + j = luaK_codeAsBx(fs, OP_JMP, 0, NO_JUMP); + luaK_concat(fs, &j, jpc); /* keep them on hold */ + return j; +} + + +void luaK_ret (FuncState *fs, int first, int nret) { + luaK_codeABC(fs, OP_RETURN, first, nret+1, 0); +} + + +static int condjump (FuncState *fs, OpCode op, int A, int B, int C) { + luaK_codeABC(fs, op, A, B, C); + return luaK_jump(fs); +} + + +static void fixjump (FuncState *fs, int pc, int dest) { + Instruction *jmp = &fs->f->code[pc]; + int offset = dest-(pc+1); + lua_assert(dest != NO_JUMP); + if (abs(offset) > MAXARG_sBx) + luaX_syntaxerror(fs->ls, "control structure too long"); + SETARG_sBx(*jmp, offset); +} + + +/* +** returns current `pc' and marks it as a jump target (to avoid wrong +** optimizations with consecutive instructions not in the same basic block). +*/ +int luaK_getlabel (FuncState *fs) { + fs->lasttarget = fs->pc; + return fs->pc; +} + + +static int getjump (FuncState *fs, int pc) { + int offset = GETARG_sBx(fs->f->code[pc]); + if (offset == NO_JUMP) /* point to itself represents end of list */ + return NO_JUMP; /* end of list */ + else + return (pc+1)+offset; /* turn offset into absolute position */ +} + + +static Instruction *getjumpcontrol (FuncState *fs, int pc) { + Instruction *pi = &fs->f->code[pc]; + if (pc >= 1 && testTMode(GET_OPCODE(*(pi-1)))) + return pi-1; + else + return pi; +} + + +/* +** check whether list has any jump that do not produce a value +** (or produce an inverted value) +*/ +static int need_value (FuncState *fs, int list) { + for (; list != NO_JUMP; list = getjump(fs, list)) { + Instruction i = *getjumpcontrol(fs, list); + if (GET_OPCODE(i) != OP_TESTSET) return 1; + } + return 0; /* not found */ +} + + +static int patchtestreg (FuncState *fs, int node, int reg) { + Instruction *i = getjumpcontrol(fs, node); + if (GET_OPCODE(*i) != OP_TESTSET) + return 0; /* cannot patch other instructions */ + if (reg != NO_REG && reg != GETARG_B(*i)) + SETARG_A(*i, reg); + else /* no register to put value or register already has the value */ + *i = CREATE_ABC(OP_TEST, GETARG_B(*i), 0, GETARG_C(*i)); + + return 1; +} + + +static void removevalues (FuncState *fs, int list) { + for (; list != NO_JUMP; list = getjump(fs, list)) + patchtestreg(fs, list, NO_REG); +} + + +static void patchlistaux (FuncState *fs, int list, int vtarget, int reg, + int dtarget) { + while (list != NO_JUMP) { + int next = getjump(fs, list); + if (patchtestreg(fs, list, reg)) + fixjump(fs, list, vtarget); + else + fixjump(fs, list, dtarget); /* jump to default target */ + list = next; + } +} + + +static void dischargejpc (FuncState *fs) { + patchlistaux(fs, fs->jpc, fs->pc, NO_REG, fs->pc); + fs->jpc = NO_JUMP; +} + + +void luaK_patchlist (FuncState *fs, int list, int target) { + if (target == fs->pc) + luaK_patchtohere(fs, list); + else { + lua_assert(target < fs->pc); + patchlistaux(fs, list, target, NO_REG, target); + } +} + + +LUAI_FUNC void luaK_patchclose (FuncState *fs, int list, int level) { + level++; /* argument is +1 to reserve 0 as non-op */ + while (list != NO_JUMP) { + int next = getjump(fs, list); + lua_assert(GET_OPCODE(fs->f->code[list]) == OP_JMP && + (GETARG_A(fs->f->code[list]) == 0 || + GETARG_A(fs->f->code[list]) >= level)); + SETARG_A(fs->f->code[list], level); + list = next; + } +} + + +void luaK_patchtohere (FuncState *fs, int list) { + luaK_getlabel(fs); + luaK_concat(fs, &fs->jpc, list); +} + + +void luaK_concat (FuncState *fs, int *l1, int l2) { + if (l2 == NO_JUMP) return; + else if (*l1 == NO_JUMP) + *l1 = l2; + else { + int list = *l1; + int next; + while ((next = getjump(fs, list)) != NO_JUMP) /* find last element */ + list = next; + fixjump(fs, list, l2); + } +} + + +static int luaK_code (FuncState *fs, Instruction i) { + Proto *f = fs->f; + dischargejpc(fs); /* `pc' will change */ + /* put new instruction in code array */ + luaM_growvector(fs->ls->L, f->code, fs->pc, f->sizecode, Instruction, + MAX_INT, "opcodes"); + f->code[fs->pc] = i; + /* save corresponding line information */ + luaM_growvector(fs->ls->L, f->lineinfo, fs->pc, f->sizelineinfo, int, + MAX_INT, "opcodes"); + f->lineinfo[fs->pc] = fs->ls->lastline; + return fs->pc++; +} + + +int luaK_codeABC (FuncState *fs, OpCode o, int a, int b, int c) { + lua_assert(getOpMode(o) == iABC); + lua_assert(getBMode(o) != OpArgN || b == 0); + lua_assert(getCMode(o) != OpArgN || c == 0); + lua_assert(a <= MAXARG_A && b <= MAXARG_B && c <= MAXARG_C); + return luaK_code(fs, CREATE_ABC(o, a, b, c)); +} + + +int luaK_codeABx (FuncState *fs, OpCode o, int a, unsigned int bc) { + lua_assert(getOpMode(o) == iABx || getOpMode(o) == iAsBx); + lua_assert(getCMode(o) == OpArgN); + lua_assert(a <= MAXARG_A && bc <= MAXARG_Bx); + return luaK_code(fs, CREATE_ABx(o, a, bc)); +} + + +static int codeextraarg (FuncState *fs, int a) { + lua_assert(a <= MAXARG_Ax); + return luaK_code(fs, CREATE_Ax(OP_EXTRAARG, a)); +} + + +int luaK_codek (FuncState *fs, int reg, int k) { + if (k <= MAXARG_Bx) + return luaK_codeABx(fs, OP_LOADK, reg, k); + else { + int p = luaK_codeABx(fs, OP_LOADKX, reg, 0); + codeextraarg(fs, k); + return p; + } +} + + +void luaK_checkstack (FuncState *fs, int n) { + int newstack = fs->freereg + n; + if (newstack > fs->f->maxstacksize) { + if (newstack >= MAXSTACK) + luaX_syntaxerror(fs->ls, "function or expression too complex"); + fs->f->maxstacksize = cast_byte(newstack); + } +} + + +void luaK_reserveregs (FuncState *fs, int n) { + luaK_checkstack(fs, n); + fs->freereg += n; +} + + +static void freereg (FuncState *fs, int reg) { + if (!ISK(reg) && reg >= fs->nactvar) { + fs->freereg--; + lua_assert(reg == fs->freereg); + } +} + + +static void freeexp (FuncState *fs, expdesc *e) { + if (e->k == VNONRELOC) + freereg(fs, e->u.info); +} + + +static int addk (FuncState *fs, TValue *key, TValue *v) { + lua_State *L = fs->ls->L; + TValue *idx = luaH_set(L, fs->h, key); + Proto *f = fs->f; + int k, oldsize; + if (ttisnumber(idx)) { + lua_Number n = nvalue(idx); + lua_number2int(k, n); + if (luaV_rawequalobj(&f->k[k], v)) + return k; + /* else may be a collision (e.g., between 0.0 and "\0\0\0\0\0\0\0\0"); + go through and create a new entry for this value */ + } + /* constant not found; create a new entry */ + oldsize = f->sizek; + k = fs->nk; + /* numerical value does not need GC barrier; + table has no metatable, so it does not need to invalidate cache */ + setnvalue(idx, cast_num(k)); + luaM_growvector(L, f->k, k, f->sizek, TValue, MAXARG_Ax, "constants"); + while (oldsize < f->sizek) setnilvalue(&f->k[oldsize++]); + setobj(L, &f->k[k], v); + fs->nk++; + luaC_barrier(L, f, v); + return k; +} + + +int luaK_stringK (FuncState *fs, TString *s) { + TValue o; + setsvalue(fs->ls->L, &o, s); + return addk(fs, &o, &o); +} + + +int luaK_numberK (FuncState *fs, lua_Number r) { + int n; + lua_State *L = fs->ls->L; + TValue o; + setnvalue(&o, r); + if (r == 0 || luai_numisnan(NULL, r)) { /* handle -0 and NaN */ + /* use raw representation as key to avoid numeric problems */ + setsvalue(L, L->top++, luaS_newlstr(L, (char *)&r, sizeof(r))); + n = addk(fs, L->top - 1, &o); + L->top--; + } + else + n = addk(fs, &o, &o); /* regular case */ + return n; +} + + +static int boolK (FuncState *fs, int b) { + TValue o; + setbvalue(&o, b); + return addk(fs, &o, &o); +} + + +static int nilK (FuncState *fs) { + TValue k, v; + setnilvalue(&v); + /* cannot use nil as key; instead use table itself to represent nil */ + sethvalue(fs->ls->L, &k, fs->h); + return addk(fs, &k, &v); +} + + +void luaK_setreturns (FuncState *fs, expdesc *e, int nresults) { + if (e->k == VCALL) { /* expression is an open function call? */ + SETARG_C(getcode(fs, e), nresults+1); + } + else if (e->k == VVARARG) { + SETARG_B(getcode(fs, e), nresults+1); + SETARG_A(getcode(fs, e), fs->freereg); + luaK_reserveregs(fs, 1); + } +} + + +void luaK_setoneret (FuncState *fs, expdesc *e) { + if (e->k == VCALL) { /* expression is an open function call? */ + e->k = VNONRELOC; + e->u.info = GETARG_A(getcode(fs, e)); + } + else if (e->k == VVARARG) { + SETARG_B(getcode(fs, e), 2); + e->k = VRELOCABLE; /* can relocate its simple result */ + } +} + + +void luaK_dischargevars (FuncState *fs, expdesc *e) { + switch (e->k) { + case VLOCAL: { + e->k = VNONRELOC; + break; + } + case VUPVAL: { + e->u.info = luaK_codeABC(fs, OP_GETUPVAL, 0, e->u.info, 0); + e->k = VRELOCABLE; + break; + } + case VINDEXED: { + OpCode op = OP_GETTABUP; /* assume 't' is in an upvalue */ + freereg(fs, e->u.ind.idx); + if (e->u.ind.vt == VLOCAL) { /* 't' is in a register? */ + freereg(fs, e->u.ind.t); + op = OP_GETTABLE; + } + e->u.info = luaK_codeABC(fs, op, 0, e->u.ind.t, e->u.ind.idx); + e->k = VRELOCABLE; + break; + } + case VVARARG: + case VCALL: { + luaK_setoneret(fs, e); + break; + } + default: break; /* there is one value available (somewhere) */ + } +} + + +static int code_label (FuncState *fs, int A, int b, int jump) { + luaK_getlabel(fs); /* those instructions may be jump targets */ + return luaK_codeABC(fs, OP_LOADBOOL, A, b, jump); +} + + +static void discharge2reg (FuncState *fs, expdesc *e, int reg) { + luaK_dischargevars(fs, e); + switch (e->k) { + case VNIL: { + luaK_nil(fs, reg, 1); + break; + } + case VFALSE: case VTRUE: { + luaK_codeABC(fs, OP_LOADBOOL, reg, e->k == VTRUE, 0); + break; + } + case VK: { + luaK_codek(fs, reg, e->u.info); + break; + } + case VKNUM: { + luaK_codek(fs, reg, luaK_numberK(fs, e->u.nval)); + break; + } + case VRELOCABLE: { + Instruction *pc = &getcode(fs, e); + SETARG_A(*pc, reg); + break; + } + case VNONRELOC: { + if (reg != e->u.info) + luaK_codeABC(fs, OP_MOVE, reg, e->u.info, 0); + break; + } + default: { + lua_assert(e->k == VVOID || e->k == VJMP); + return; /* nothing to do... */ + } + } + e->u.info = reg; + e->k = VNONRELOC; +} + + +static void discharge2anyreg (FuncState *fs, expdesc *e) { + if (e->k != VNONRELOC) { + luaK_reserveregs(fs, 1); + discharge2reg(fs, e, fs->freereg-1); + } +} + + +static void exp2reg (FuncState *fs, expdesc *e, int reg) { + discharge2reg(fs, e, reg); + if (e->k == VJMP) + luaK_concat(fs, &e->t, e->u.info); /* put this jump in `t' list */ + if (hasjumps(e)) { + int final; /* position after whole expression */ + int p_f = NO_JUMP; /* position of an eventual LOAD false */ + int p_t = NO_JUMP; /* position of an eventual LOAD true */ + if (need_value(fs, e->t) || need_value(fs, e->f)) { + int fj = (e->k == VJMP) ? NO_JUMP : luaK_jump(fs); + p_f = code_label(fs, reg, 0, 1); + p_t = code_label(fs, reg, 1, 0); + luaK_patchtohere(fs, fj); + } + final = luaK_getlabel(fs); + patchlistaux(fs, e->f, final, reg, p_f); + patchlistaux(fs, e->t, final, reg, p_t); + } + e->f = e->t = NO_JUMP; + e->u.info = reg; + e->k = VNONRELOC; +} + + +void luaK_exp2nextreg (FuncState *fs, expdesc *e) { + luaK_dischargevars(fs, e); + freeexp(fs, e); + luaK_reserveregs(fs, 1); + exp2reg(fs, e, fs->freereg - 1); +} + + +int luaK_exp2anyreg (FuncState *fs, expdesc *e) { + luaK_dischargevars(fs, e); + if (e->k == VNONRELOC) { + if (!hasjumps(e)) return e->u.info; /* exp is already in a register */ + if (e->u.info >= fs->nactvar) { /* reg. is not a local? */ + exp2reg(fs, e, e->u.info); /* put value on it */ + return e->u.info; + } + } + luaK_exp2nextreg(fs, e); /* default */ + return e->u.info; +} + + +void luaK_exp2anyregup (FuncState *fs, expdesc *e) { + if (e->k != VUPVAL || hasjumps(e)) + luaK_exp2anyreg(fs, e); +} + + +void luaK_exp2val (FuncState *fs, expdesc *e) { + if (hasjumps(e)) + luaK_exp2anyreg(fs, e); + else + luaK_dischargevars(fs, e); +} + + +int luaK_exp2RK (FuncState *fs, expdesc *e) { + luaK_exp2val(fs, e); + switch (e->k) { + case VTRUE: + case VFALSE: + case VNIL: { + if (fs->nk <= MAXINDEXRK) { /* constant fits in RK operand? */ + e->u.info = (e->k == VNIL) ? nilK(fs) : boolK(fs, (e->k == VTRUE)); + e->k = VK; + return RKASK(e->u.info); + } + else break; + } + case VKNUM: { + e->u.info = luaK_numberK(fs, e->u.nval); + e->k = VK; + /* go through */ + } + case VK: { + if (e->u.info <= MAXINDEXRK) /* constant fits in argC? */ + return RKASK(e->u.info); + else break; + } + default: break; + } + /* not a constant in the right range: put it in a register */ + return luaK_exp2anyreg(fs, e); +} + + +void luaK_storevar (FuncState *fs, expdesc *var, expdesc *ex) { + switch (var->k) { + case VLOCAL: { + freeexp(fs, ex); + exp2reg(fs, ex, var->u.info); + return; + } + case VUPVAL: { + int e = luaK_exp2anyreg(fs, ex); + luaK_codeABC(fs, OP_SETUPVAL, e, var->u.info, 0); + break; + } + case VINDEXED: { + OpCode op = (var->u.ind.vt == VLOCAL) ? OP_SETTABLE : OP_SETTABUP; + int e = luaK_exp2RK(fs, ex); + luaK_codeABC(fs, op, var->u.ind.t, var->u.ind.idx, e); + break; + } + default: { + lua_assert(0); /* invalid var kind to store */ + break; + } + } + freeexp(fs, ex); +} + + +void luaK_self (FuncState *fs, expdesc *e, expdesc *key) { + int ereg; + luaK_exp2anyreg(fs, e); + ereg = e->u.info; /* register where 'e' was placed */ + freeexp(fs, e); + e->u.info = fs->freereg; /* base register for op_self */ + e->k = VNONRELOC; + luaK_reserveregs(fs, 2); /* function and 'self' produced by op_self */ + luaK_codeABC(fs, OP_SELF, e->u.info, ereg, luaK_exp2RK(fs, key)); + freeexp(fs, key); +} + + +static void invertjump (FuncState *fs, expdesc *e) { + Instruction *pc = getjumpcontrol(fs, e->u.info); + lua_assert(testTMode(GET_OPCODE(*pc)) && GET_OPCODE(*pc) != OP_TESTSET && + GET_OPCODE(*pc) != OP_TEST); + SETARG_A(*pc, !(GETARG_A(*pc))); +} + + +static int jumponcond (FuncState *fs, expdesc *e, int cond) { + if (e->k == VRELOCABLE) { + Instruction ie = getcode(fs, e); + if (GET_OPCODE(ie) == OP_NOT) { + fs->pc--; /* remove previous OP_NOT */ + return condjump(fs, OP_TEST, GETARG_B(ie), 0, !cond); + } + /* else go through */ + } + discharge2anyreg(fs, e); + freeexp(fs, e); + return condjump(fs, OP_TESTSET, NO_REG, e->u.info, cond); +} + + +void luaK_goiftrue (FuncState *fs, expdesc *e) { + int pc; /* pc of last jump */ + luaK_dischargevars(fs, e); + switch (e->k) { + case VJMP: { + invertjump(fs, e); + pc = e->u.info; + break; + } + case VK: case VKNUM: case VTRUE: { + pc = NO_JUMP; /* always true; do nothing */ + break; + } + default: { + pc = jumponcond(fs, e, 0); + break; + } + } + luaK_concat(fs, &e->f, pc); /* insert last jump in `f' list */ + luaK_patchtohere(fs, e->t); + e->t = NO_JUMP; +} + + +void luaK_goiffalse (FuncState *fs, expdesc *e) { + int pc; /* pc of last jump */ + luaK_dischargevars(fs, e); + switch (e->k) { + case VJMP: { + pc = e->u.info; + break; + } + case VNIL: case VFALSE: { + pc = NO_JUMP; /* always false; do nothing */ + break; + } + default: { + pc = jumponcond(fs, e, 1); + break; + } + } + luaK_concat(fs, &e->t, pc); /* insert last jump in `t' list */ + luaK_patchtohere(fs, e->f); + e->f = NO_JUMP; +} + + +static void codenot (FuncState *fs, expdesc *e) { + luaK_dischargevars(fs, e); + switch (e->k) { + case VNIL: case VFALSE: { + e->k = VTRUE; + break; + } + case VK: case VKNUM: case VTRUE: { + e->k = VFALSE; + break; + } + case VJMP: { + invertjump(fs, e); + break; + } + case VRELOCABLE: + case VNONRELOC: { + discharge2anyreg(fs, e); + freeexp(fs, e); + e->u.info = luaK_codeABC(fs, OP_NOT, 0, e->u.info, 0); + e->k = VRELOCABLE; + break; + } + default: { + lua_assert(0); /* cannot happen */ + break; + } + } + /* interchange true and false lists */ + { int temp = e->f; e->f = e->t; e->t = temp; } + removevalues(fs, e->f); + removevalues(fs, e->t); +} + + +void luaK_indexed (FuncState *fs, expdesc *t, expdesc *k) { + lua_assert(!hasjumps(t)); + t->u.ind.t = t->u.info; + t->u.ind.idx = luaK_exp2RK(fs, k); + t->u.ind.vt = (t->k == VUPVAL) ? VUPVAL + : check_exp(vkisinreg(t->k), VLOCAL); + t->k = VINDEXED; +} + + +static int constfolding (OpCode op, expdesc *e1, expdesc *e2) { + lua_Number r; + if (!isnumeral(e1) || !isnumeral(e2)) return 0; + if ((op == OP_DIV || op == OP_MOD) && e2->u.nval == 0) + return 0; /* do not attempt to divide by 0 */ + r = luaO_arith(op - OP_ADD + LUA_OPADD, e1->u.nval, e2->u.nval); + e1->u.nval = r; + return 1; +} + + +static void codearith (FuncState *fs, OpCode op, + expdesc *e1, expdesc *e2, int line) { + if (constfolding(op, e1, e2)) + return; + else { + int o2 = (op != OP_UNM && op != OP_LEN) ? luaK_exp2RK(fs, e2) : 0; + int o1 = luaK_exp2RK(fs, e1); + if (o1 > o2) { + freeexp(fs, e1); + freeexp(fs, e2); + } + else { + freeexp(fs, e2); + freeexp(fs, e1); + } + e1->u.info = luaK_codeABC(fs, op, 0, o1, o2); + e1->k = VRELOCABLE; + luaK_fixline(fs, line); + } +} + + +static void codecomp (FuncState *fs, OpCode op, int cond, expdesc *e1, + expdesc *e2) { + int o1 = luaK_exp2RK(fs, e1); + int o2 = luaK_exp2RK(fs, e2); + freeexp(fs, e2); + freeexp(fs, e1); + if (cond == 0 && op != OP_EQ) { + int temp; /* exchange args to replace by `<' or `<=' */ + temp = o1; o1 = o2; o2 = temp; /* o1 <==> o2 */ + cond = 1; + } + e1->u.info = condjump(fs, op, cond, o1, o2); + e1->k = VJMP; +} + + +void luaK_prefix (FuncState *fs, UnOpr op, expdesc *e, int line) { + expdesc e2; + e2.t = e2.f = NO_JUMP; e2.k = VKNUM; e2.u.nval = 0; + switch (op) { + case OPR_MINUS: { + if (isnumeral(e)) /* minus constant? */ + e->u.nval = luai_numunm(NULL, e->u.nval); /* fold it */ + else { + luaK_exp2anyreg(fs, e); + codearith(fs, OP_UNM, e, &e2, line); + } + break; + } + case OPR_NOT: codenot(fs, e); break; + case OPR_LEN: { + luaK_exp2anyreg(fs, e); /* cannot operate on constants */ + codearith(fs, OP_LEN, e, &e2, line); + break; + } + default: lua_assert(0); + } +} + + +void luaK_infix (FuncState *fs, BinOpr op, expdesc *v) { + switch (op) { + case OPR_AND: { + luaK_goiftrue(fs, v); + break; + } + case OPR_OR: { + luaK_goiffalse(fs, v); + break; + } + case OPR_CONCAT: { + luaK_exp2nextreg(fs, v); /* operand must be on the `stack' */ + break; + } + case OPR_ADD: case OPR_SUB: case OPR_MUL: case OPR_DIV: + case OPR_MOD: case OPR_POW: { + if (!isnumeral(v)) luaK_exp2RK(fs, v); + break; + } + default: { + luaK_exp2RK(fs, v); + break; + } + } +} + + +void luaK_posfix (FuncState *fs, BinOpr op, + expdesc *e1, expdesc *e2, int line) { + switch (op) { + case OPR_AND: { + lua_assert(e1->t == NO_JUMP); /* list must be closed */ + luaK_dischargevars(fs, e2); + luaK_concat(fs, &e2->f, e1->f); + *e1 = *e2; + break; + } + case OPR_OR: { + lua_assert(e1->f == NO_JUMP); /* list must be closed */ + luaK_dischargevars(fs, e2); + luaK_concat(fs, &e2->t, e1->t); + *e1 = *e2; + break; + } + case OPR_CONCAT: { + luaK_exp2val(fs, e2); + if (e2->k == VRELOCABLE && GET_OPCODE(getcode(fs, e2)) == OP_CONCAT) { + lua_assert(e1->u.info == GETARG_B(getcode(fs, e2))-1); + freeexp(fs, e1); + SETARG_B(getcode(fs, e2), e1->u.info); + e1->k = VRELOCABLE; e1->u.info = e2->u.info; + } + else { + luaK_exp2nextreg(fs, e2); /* operand must be on the 'stack' */ + codearith(fs, OP_CONCAT, e1, e2, line); + } + break; + } + case OPR_ADD: case OPR_SUB: case OPR_MUL: case OPR_DIV: + case OPR_MOD: case OPR_POW: { + codearith(fs, cast(OpCode, op - OPR_ADD + OP_ADD), e1, e2, line); + break; + } + case OPR_EQ: case OPR_LT: case OPR_LE: { + codecomp(fs, cast(OpCode, op - OPR_EQ + OP_EQ), 1, e1, e2); + break; + } + case OPR_NE: case OPR_GT: case OPR_GE: { + codecomp(fs, cast(OpCode, op - OPR_NE + OP_EQ), 0, e1, e2); + break; + } + default: lua_assert(0); + } +} + + +void luaK_fixline (FuncState *fs, int line) { + fs->f->lineinfo[fs->pc - 1] = line; +} + + +void luaK_setlist (FuncState *fs, int base, int nelems, int tostore) { + int c = (nelems - 1)/LFIELDS_PER_FLUSH + 1; + int b = (tostore == LUA_MULTRET) ? 0 : tostore; + lua_assert(tostore != 0); + if (c <= MAXARG_C) + luaK_codeABC(fs, OP_SETLIST, base, b, c); + else if (c <= MAXARG_Ax) { + luaK_codeABC(fs, OP_SETLIST, base, b, 0); + codeextraarg(fs, c); + } + else + luaX_syntaxerror(fs->ls, "constructor too long"); + fs->freereg = base + 1; /* free registers with list values */ +} + diff --git a/ext/lua/src/lcorolib.c b/ext/lua/src/lcorolib.c new file mode 100644 index 000000000..1326c8146 --- /dev/null +++ b/ext/lua/src/lcorolib.c @@ -0,0 +1,155 @@ +/* +** $Id: lcorolib.c,v 1.5 2013/02/21 13:44:53 roberto Exp $ +** Coroutine Library +** See Copyright Notice in lua.h +*/ + + +#include + + +#define lcorolib_c +#define LUA_LIB + +#include "lua.h" + +#include "lauxlib.h" +#include "lualib.h" + + +static int auxresume (lua_State *L, lua_State *co, int narg) { + int status; + if (!lua_checkstack(co, narg)) { + lua_pushliteral(L, "too many arguments to resume"); + return -1; /* error flag */ + } + if (lua_status(co) == LUA_OK && lua_gettop(co) == 0) { + lua_pushliteral(L, "cannot resume dead coroutine"); + return -1; /* error flag */ + } + lua_xmove(L, co, narg); + status = lua_resume(co, L, narg); + if (status == LUA_OK || status == LUA_YIELD) { + int nres = lua_gettop(co); + if (!lua_checkstack(L, nres + 1)) { + lua_pop(co, nres); /* remove results anyway */ + lua_pushliteral(L, "too many results to resume"); + return -1; /* error flag */ + } + lua_xmove(co, L, nres); /* move yielded values */ + return nres; + } + else { + lua_xmove(co, L, 1); /* move error message */ + return -1; /* error flag */ + } +} + + +static int luaB_coresume (lua_State *L) { + lua_State *co = lua_tothread(L, 1); + int r; + luaL_argcheck(L, co, 1, "coroutine expected"); + r = auxresume(L, co, lua_gettop(L) - 1); + if (r < 0) { + lua_pushboolean(L, 0); + lua_insert(L, -2); + return 2; /* return false + error message */ + } + else { + lua_pushboolean(L, 1); + lua_insert(L, -(r + 1)); + return r + 1; /* return true + `resume' returns */ + } +} + + +static int luaB_auxwrap (lua_State *L) { + lua_State *co = lua_tothread(L, lua_upvalueindex(1)); + int r = auxresume(L, co, lua_gettop(L)); + if (r < 0) { + if (lua_isstring(L, -1)) { /* error object is a string? */ + luaL_where(L, 1); /* add extra info */ + lua_insert(L, -2); + lua_concat(L, 2); + } + return lua_error(L); /* propagate error */ + } + return r; +} + + +static int luaB_cocreate (lua_State *L) { + lua_State *NL; + luaL_checktype(L, 1, LUA_TFUNCTION); + NL = lua_newthread(L); + lua_pushvalue(L, 1); /* move function to top */ + lua_xmove(L, NL, 1); /* move function from L to NL */ + return 1; +} + + +static int luaB_cowrap (lua_State *L) { + luaB_cocreate(L); + lua_pushcclosure(L, luaB_auxwrap, 1); + return 1; +} + + +static int luaB_yield (lua_State *L) { + return lua_yield(L, lua_gettop(L)); +} + + +static int luaB_costatus (lua_State *L) { + lua_State *co = lua_tothread(L, 1); + luaL_argcheck(L, co, 1, "coroutine expected"); + if (L == co) lua_pushliteral(L, "running"); + else { + switch (lua_status(co)) { + case LUA_YIELD: + lua_pushliteral(L, "suspended"); + break; + case LUA_OK: { + lua_Debug ar; + if (lua_getstack(co, 0, &ar) > 0) /* does it have frames? */ + lua_pushliteral(L, "normal"); /* it is running */ + else if (lua_gettop(co) == 0) + lua_pushliteral(L, "dead"); + else + lua_pushliteral(L, "suspended"); /* initial state */ + break; + } + default: /* some error occurred */ + lua_pushliteral(L, "dead"); + break; + } + } + return 1; +} + + +static int luaB_corunning (lua_State *L) { + int ismain = lua_pushthread(L); + lua_pushboolean(L, ismain); + return 2; +} + + +static const luaL_Reg co_funcs[] = { + {"create", luaB_cocreate}, + {"resume", luaB_coresume}, + {"running", luaB_corunning}, + {"status", luaB_costatus}, + {"wrap", luaB_cowrap}, + {"yield", luaB_yield}, + {NULL, NULL} +}; + + + +LUAMOD_API int luaopen_coroutine (lua_State *L) { + luaL_newlib(L, co_funcs); + return 1; +} + diff --git a/ext/lua/src/lctype.c b/ext/lua/src/lctype.c new file mode 100644 index 000000000..55e433a5d --- /dev/null +++ b/ext/lua/src/lctype.c @@ -0,0 +1,52 @@ +/* +** $Id: lctype.c,v 1.11 2011/10/03 16:19:23 roberto Exp $ +** 'ctype' functions for Lua +** See Copyright Notice in lua.h +*/ + +#define lctype_c +#define LUA_CORE + +#include "lctype.h" + +#if !LUA_USE_CTYPE /* { */ + +#include + +LUAI_DDEF const lu_byte luai_ctype_[UCHAR_MAX + 2] = { + 0x00, /* EOZ */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0. */ + 0x00, 0x08, 0x08, 0x08, 0x08, 0x08, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 1. */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x0c, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, /* 2. */ + 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, + 0x16, 0x16, 0x16, 0x16, 0x16, 0x16, 0x16, 0x16, /* 3. */ + 0x16, 0x16, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, + 0x04, 0x15, 0x15, 0x15, 0x15, 0x15, 0x15, 0x05, /* 4. */ + 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, + 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, /* 5. */ + 0x05, 0x05, 0x05, 0x04, 0x04, 0x04, 0x04, 0x05, + 0x04, 0x15, 0x15, 0x15, 0x15, 0x15, 0x15, 0x05, /* 6. */ + 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, + 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, /* 7. */ + 0x05, 0x05, 0x05, 0x04, 0x04, 0x04, 0x04, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 8. */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 9. */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* a. */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* b. */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* c. */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* d. */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* e. */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* f. */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +}; + +#endif /* } */ diff --git a/ext/lua/src/ldblib.c b/ext/lua/src/ldblib.c new file mode 100644 index 000000000..c02269457 --- /dev/null +++ b/ext/lua/src/ldblib.c @@ -0,0 +1,398 @@ +/* +** $Id: ldblib.c,v 1.132 2012/01/19 20:14:44 roberto Exp $ +** Interface from Lua to its debug API +** See Copyright Notice in lua.h +*/ + + +#include +#include +#include + +#define ldblib_c +#define LUA_LIB + +#include "lua.h" + +#include "lauxlib.h" +#include "lualib.h" + + +#define HOOKKEY "_HKEY" + + + +static int db_getregistry (lua_State *L) { + lua_pushvalue(L, LUA_REGISTRYINDEX); + return 1; +} + + +static int db_getmetatable (lua_State *L) { + luaL_checkany(L, 1); + if (!lua_getmetatable(L, 1)) { + lua_pushnil(L); /* no metatable */ + } + return 1; +} + + +static int db_setmetatable (lua_State *L) { + int t = lua_type(L, 2); + luaL_argcheck(L, t == LUA_TNIL || t == LUA_TTABLE, 2, + "nil or table expected"); + lua_settop(L, 2); + lua_setmetatable(L, 1); + return 1; /* return 1st argument */ +} + + +static int db_getuservalue (lua_State *L) { + if (lua_type(L, 1) != LUA_TUSERDATA) + lua_pushnil(L); + else + lua_getuservalue(L, 1); + return 1; +} + + +static int db_setuservalue (lua_State *L) { + if (lua_type(L, 1) == LUA_TLIGHTUSERDATA) + luaL_argerror(L, 1, "full userdata expected, got light userdata"); + luaL_checktype(L, 1, LUA_TUSERDATA); + if (!lua_isnoneornil(L, 2)) + luaL_checktype(L, 2, LUA_TTABLE); + lua_settop(L, 2); + lua_setuservalue(L, 1); + return 1; +} + + +static void settabss (lua_State *L, const char *i, const char *v) { + lua_pushstring(L, v); + lua_setfield(L, -2, i); +} + + +static void settabsi (lua_State *L, const char *i, int v) { + lua_pushinteger(L, v); + lua_setfield(L, -2, i); +} + + +static void settabsb (lua_State *L, const char *i, int v) { + lua_pushboolean(L, v); + lua_setfield(L, -2, i); +} + + +static lua_State *getthread (lua_State *L, int *arg) { + if (lua_isthread(L, 1)) { + *arg = 1; + return lua_tothread(L, 1); + } + else { + *arg = 0; + return L; + } +} + + +static void treatstackoption (lua_State *L, lua_State *L1, const char *fname) { + if (L == L1) { + lua_pushvalue(L, -2); + lua_remove(L, -3); + } + else + lua_xmove(L1, L, 1); + lua_setfield(L, -2, fname); +} + + +static int db_getinfo (lua_State *L) { + lua_Debug ar; + int arg; + lua_State *L1 = getthread(L, &arg); + const char *options = luaL_optstring(L, arg+2, "flnStu"); + if (lua_isnumber(L, arg+1)) { + if (!lua_getstack(L1, (int)lua_tointeger(L, arg+1), &ar)) { + lua_pushnil(L); /* level out of range */ + return 1; + } + } + else if (lua_isfunction(L, arg+1)) { + lua_pushfstring(L, ">%s", options); + options = lua_tostring(L, -1); + lua_pushvalue(L, arg+1); + lua_xmove(L, L1, 1); + } + else + return luaL_argerror(L, arg+1, "function or level expected"); + if (!lua_getinfo(L1, options, &ar)) + return luaL_argerror(L, arg+2, "invalid option"); + lua_createtable(L, 0, 2); + if (strchr(options, 'S')) { + settabss(L, "source", ar.source); + settabss(L, "short_src", ar.short_src); + settabsi(L, "linedefined", ar.linedefined); + settabsi(L, "lastlinedefined", ar.lastlinedefined); + settabss(L, "what", ar.what); + } + if (strchr(options, 'l')) + settabsi(L, "currentline", ar.currentline); + if (strchr(options, 'u')) { + settabsi(L, "nups", ar.nups); + settabsi(L, "nparams", ar.nparams); + settabsb(L, "isvararg", ar.isvararg); + } + if (strchr(options, 'n')) { + settabss(L, "name", ar.name); + settabss(L, "namewhat", ar.namewhat); + } + if (strchr(options, 't')) + settabsb(L, "istailcall", ar.istailcall); + if (strchr(options, 'L')) + treatstackoption(L, L1, "activelines"); + if (strchr(options, 'f')) + treatstackoption(L, L1, "func"); + return 1; /* return table */ +} + + +static int db_getlocal (lua_State *L) { + int arg; + lua_State *L1 = getthread(L, &arg); + lua_Debug ar; + const char *name; + int nvar = luaL_checkint(L, arg+2); /* local-variable index */ + if (lua_isfunction(L, arg + 1)) { /* function argument? */ + lua_pushvalue(L, arg + 1); /* push function */ + lua_pushstring(L, lua_getlocal(L, NULL, nvar)); /* push local name */ + return 1; + } + else { /* stack-level argument */ + if (!lua_getstack(L1, luaL_checkint(L, arg+1), &ar)) /* out of range? */ + return luaL_argerror(L, arg+1, "level out of range"); + name = lua_getlocal(L1, &ar, nvar); + if (name) { + lua_xmove(L1, L, 1); /* push local value */ + lua_pushstring(L, name); /* push name */ + lua_pushvalue(L, -2); /* re-order */ + return 2; + } + else { + lua_pushnil(L); /* no name (nor value) */ + return 1; + } + } +} + + +static int db_setlocal (lua_State *L) { + int arg; + lua_State *L1 = getthread(L, &arg); + lua_Debug ar; + if (!lua_getstack(L1, luaL_checkint(L, arg+1), &ar)) /* out of range? */ + return luaL_argerror(L, arg+1, "level out of range"); + luaL_checkany(L, arg+3); + lua_settop(L, arg+3); + lua_xmove(L, L1, 1); + lua_pushstring(L, lua_setlocal(L1, &ar, luaL_checkint(L, arg+2))); + return 1; +} + + +static int auxupvalue (lua_State *L, int get) { + const char *name; + int n = luaL_checkint(L, 2); + luaL_checktype(L, 1, LUA_TFUNCTION); + name = get ? lua_getupvalue(L, 1, n) : lua_setupvalue(L, 1, n); + if (name == NULL) return 0; + lua_pushstring(L, name); + lua_insert(L, -(get+1)); + return get + 1; +} + + +static int db_getupvalue (lua_State *L) { + return auxupvalue(L, 1); +} + + +static int db_setupvalue (lua_State *L) { + luaL_checkany(L, 3); + return auxupvalue(L, 0); +} + + +static int checkupval (lua_State *L, int argf, int argnup) { + lua_Debug ar; + int nup = luaL_checkint(L, argnup); + luaL_checktype(L, argf, LUA_TFUNCTION); + lua_pushvalue(L, argf); + lua_getinfo(L, ">u", &ar); + luaL_argcheck(L, 1 <= nup && nup <= ar.nups, argnup, "invalid upvalue index"); + return nup; +} + + +static int db_upvalueid (lua_State *L) { + int n = checkupval(L, 1, 2); + lua_pushlightuserdata(L, lua_upvalueid(L, 1, n)); + return 1; +} + + +static int db_upvaluejoin (lua_State *L) { + int n1 = checkupval(L, 1, 2); + int n2 = checkupval(L, 3, 4); + luaL_argcheck(L, !lua_iscfunction(L, 1), 1, "Lua function expected"); + luaL_argcheck(L, !lua_iscfunction(L, 3), 3, "Lua function expected"); + lua_upvaluejoin(L, 1, n1, 3, n2); + return 0; +} + + +#define gethooktable(L) luaL_getsubtable(L, LUA_REGISTRYINDEX, HOOKKEY) + + +static void hookf (lua_State *L, lua_Debug *ar) { + static const char *const hooknames[] = + {"call", "return", "line", "count", "tail call"}; + gethooktable(L); + lua_pushthread(L); + lua_rawget(L, -2); + if (lua_isfunction(L, -1)) { + lua_pushstring(L, hooknames[(int)ar->event]); + if (ar->currentline >= 0) + lua_pushinteger(L, ar->currentline); + else lua_pushnil(L); + lua_assert(lua_getinfo(L, "lS", ar)); + lua_call(L, 2, 0); + } +} + + +static int makemask (const char *smask, int count) { + int mask = 0; + if (strchr(smask, 'c')) mask |= LUA_MASKCALL; + if (strchr(smask, 'r')) mask |= LUA_MASKRET; + if (strchr(smask, 'l')) mask |= LUA_MASKLINE; + if (count > 0) mask |= LUA_MASKCOUNT; + return mask; +} + + +static char *unmakemask (int mask, char *smask) { + int i = 0; + if (mask & LUA_MASKCALL) smask[i++] = 'c'; + if (mask & LUA_MASKRET) smask[i++] = 'r'; + if (mask & LUA_MASKLINE) smask[i++] = 'l'; + smask[i] = '\0'; + return smask; +} + + +static int db_sethook (lua_State *L) { + int arg, mask, count; + lua_Hook func; + lua_State *L1 = getthread(L, &arg); + if (lua_isnoneornil(L, arg+1)) { + lua_settop(L, arg+1); + func = NULL; mask = 0; count = 0; /* turn off hooks */ + } + else { + const char *smask = luaL_checkstring(L, arg+2); + luaL_checktype(L, arg+1, LUA_TFUNCTION); + count = luaL_optint(L, arg+3, 0); + func = hookf; mask = makemask(smask, count); + } + if (gethooktable(L) == 0) { /* creating hook table? */ + lua_pushstring(L, "k"); + lua_setfield(L, -2, "__mode"); /** hooktable.__mode = "k" */ + lua_pushvalue(L, -1); + lua_setmetatable(L, -2); /* setmetatable(hooktable) = hooktable */ + } + lua_pushthread(L1); lua_xmove(L1, L, 1); + lua_pushvalue(L, arg+1); + lua_rawset(L, -3); /* set new hook */ + lua_sethook(L1, func, mask, count); /* set hooks */ + return 0; +} + + +static int db_gethook (lua_State *L) { + int arg; + lua_State *L1 = getthread(L, &arg); + char buff[5]; + int mask = lua_gethookmask(L1); + lua_Hook hook = lua_gethook(L1); + if (hook != NULL && hook != hookf) /* external hook? */ + lua_pushliteral(L, "external hook"); + else { + gethooktable(L); + lua_pushthread(L1); lua_xmove(L1, L, 1); + lua_rawget(L, -2); /* get hook */ + lua_remove(L, -2); /* remove hook table */ + } + lua_pushstring(L, unmakemask(mask, buff)); + lua_pushinteger(L, lua_gethookcount(L1)); + return 3; +} + + +static int db_debug (lua_State *L) { + for (;;) { + char buffer[250]; + luai_writestringerror("%s", "lua_debug> "); + if (fgets(buffer, sizeof(buffer), stdin) == 0 || + strcmp(buffer, "cont\n") == 0) + return 0; + if (luaL_loadbuffer(L, buffer, strlen(buffer), "=(debug command)") || + lua_pcall(L, 0, 0, 0)) + luai_writestringerror("%s\n", lua_tostring(L, -1)); + lua_settop(L, 0); /* remove eventual returns */ + } +} + + +static int db_traceback (lua_State *L) { + int arg; + lua_State *L1 = getthread(L, &arg); + const char *msg = lua_tostring(L, arg + 1); + if (msg == NULL && !lua_isnoneornil(L, arg + 1)) /* non-string 'msg'? */ + lua_pushvalue(L, arg + 1); /* return it untouched */ + else { + int level = luaL_optint(L, arg + 2, (L == L1) ? 1 : 0); + luaL_traceback(L, L1, msg, level); + } + return 1; +} + + +static const luaL_Reg dblib[] = { + {"debug", db_debug}, + {"getuservalue", db_getuservalue}, + {"gethook", db_gethook}, + {"getinfo", db_getinfo}, + {"getlocal", db_getlocal}, + {"getregistry", db_getregistry}, + {"getmetatable", db_getmetatable}, + {"getupvalue", db_getupvalue}, + {"upvaluejoin", db_upvaluejoin}, + {"upvalueid", db_upvalueid}, + {"setuservalue", db_setuservalue}, + {"sethook", db_sethook}, + {"setlocal", db_setlocal}, + {"setmetatable", db_setmetatable}, + {"setupvalue", db_setupvalue}, + {"traceback", db_traceback}, + {NULL, NULL} +}; + + +LUAMOD_API int luaopen_debug (lua_State *L) { + luaL_newlib(L, dblib); + return 1; +} + diff --git a/ext/lua/src/ldebug.c b/ext/lua/src/ldebug.c new file mode 100644 index 000000000..7e04f9d09 --- /dev/null +++ b/ext/lua/src/ldebug.c @@ -0,0 +1,580 @@ +/* +** $Id: ldebug.c,v 2.90 2012/08/16 17:34:28 roberto Exp $ +** Debug Interface +** See Copyright Notice in lua.h +*/ + + +#include +#include +#include + + +#define ldebug_c +#define LUA_CORE + +#include "lua.h" + +#include "lapi.h" +#include "lcode.h" +#include "ldebug.h" +#include "ldo.h" +#include "lfunc.h" +#include "lobject.h" +#include "lopcodes.h" +#include "lstate.h" +#include "lstring.h" +#include "ltable.h" +#include "ltm.h" +#include "lvm.h" + + + +#define noLuaClosure(f) ((f) == NULL || (f)->c.tt == LUA_TCCL) + + +static const char *getfuncname (lua_State *L, CallInfo *ci, const char **name); + + +static int currentpc (CallInfo *ci) { + lua_assert(isLua(ci)); + return pcRel(ci->u.l.savedpc, ci_func(ci)->p); +} + + +static int currentline (CallInfo *ci) { + return getfuncline(ci_func(ci)->p, currentpc(ci)); +} + + +/* +** this function can be called asynchronous (e.g. during a signal) +*/ +LUA_API int lua_sethook (lua_State *L, lua_Hook func, int mask, int count) { + if (func == NULL || mask == 0) { /* turn off hooks? */ + mask = 0; + func = NULL; + } + if (isLua(L->ci)) + L->oldpc = L->ci->u.l.savedpc; + L->hook = func; + L->basehookcount = count; + resethookcount(L); + L->hookmask = cast_byte(mask); + return 1; +} + + +LUA_API lua_Hook lua_gethook (lua_State *L) { + return L->hook; +} + + +LUA_API int lua_gethookmask (lua_State *L) { + return L->hookmask; +} + + +LUA_API int lua_gethookcount (lua_State *L) { + return L->basehookcount; +} + + +LUA_API int lua_getstack (lua_State *L, int level, lua_Debug *ar) { + int status; + CallInfo *ci; + if (level < 0) return 0; /* invalid (negative) level */ + lua_lock(L); + for (ci = L->ci; level > 0 && ci != &L->base_ci; ci = ci->previous) + level--; + if (level == 0 && ci != &L->base_ci) { /* level found? */ + status = 1; + ar->i_ci = ci; + } + else status = 0; /* no such level */ + lua_unlock(L); + return status; +} + + +static const char *upvalname (Proto *p, int uv) { + TString *s = check_exp(uv < p->sizeupvalues, p->upvalues[uv].name); + if (s == NULL) return "?"; + else return getstr(s); +} + + +static const char *findvararg (CallInfo *ci, int n, StkId *pos) { + int nparams = clLvalue(ci->func)->p->numparams; + if (n >= ci->u.l.base - ci->func - nparams) + return NULL; /* no such vararg */ + else { + *pos = ci->func + nparams + n; + return "(*vararg)"; /* generic name for any vararg */ + } +} + + +static const char *findlocal (lua_State *L, CallInfo *ci, int n, + StkId *pos) { + const char *name = NULL; + StkId base; + if (isLua(ci)) { + if (n < 0) /* access to vararg values? */ + return findvararg(ci, -n, pos); + else { + base = ci->u.l.base; + name = luaF_getlocalname(ci_func(ci)->p, n, currentpc(ci)); + } + } + else + base = ci->func + 1; + if (name == NULL) { /* no 'standard' name? */ + StkId limit = (ci == L->ci) ? L->top : ci->next->func; + if (limit - base >= n && n > 0) /* is 'n' inside 'ci' stack? */ + name = "(*temporary)"; /* generic name for any valid slot */ + else + return NULL; /* no name */ + } + *pos = base + (n - 1); + return name; +} + + +LUA_API const char *lua_getlocal (lua_State *L, const lua_Debug *ar, int n) { + const char *name; + lua_lock(L); + if (ar == NULL) { /* information about non-active function? */ + if (!isLfunction(L->top - 1)) /* not a Lua function? */ + name = NULL; + else /* consider live variables at function start (parameters) */ + name = luaF_getlocalname(clLvalue(L->top - 1)->p, n, 0); + } + else { /* active function; get information through 'ar' */ + StkId pos = 0; /* to avoid warnings */ + name = findlocal(L, ar->i_ci, n, &pos); + if (name) { + setobj2s(L, L->top, pos); + api_incr_top(L); + } + } + lua_unlock(L); + return name; +} + + +LUA_API const char *lua_setlocal (lua_State *L, const lua_Debug *ar, int n) { + StkId pos = 0; /* to avoid warnings */ + const char *name = findlocal(L, ar->i_ci, n, &pos); + lua_lock(L); + if (name) + setobjs2s(L, pos, L->top - 1); + L->top--; /* pop value */ + lua_unlock(L); + return name; +} + + +static void funcinfo (lua_Debug *ar, Closure *cl) { + if (noLuaClosure(cl)) { + ar->source = "=[C]"; + ar->linedefined = -1; + ar->lastlinedefined = -1; + ar->what = "C"; + } + else { + Proto *p = cl->l.p; + ar->source = p->source ? getstr(p->source) : "=?"; + ar->linedefined = p->linedefined; + ar->lastlinedefined = p->lastlinedefined; + ar->what = (ar->linedefined == 0) ? "main" : "Lua"; + } + luaO_chunkid(ar->short_src, ar->source, LUA_IDSIZE); +} + + +static void collectvalidlines (lua_State *L, Closure *f) { + if (noLuaClosure(f)) { + setnilvalue(L->top); + api_incr_top(L); + } + else { + int i; + TValue v; + int *lineinfo = f->l.p->lineinfo; + Table *t = luaH_new(L); /* new table to store active lines */ + sethvalue(L, L->top, t); /* push it on stack */ + api_incr_top(L); + setbvalue(&v, 1); /* boolean 'true' to be the value of all indices */ + for (i = 0; i < f->l.p->sizelineinfo; i++) /* for all lines with code */ + luaH_setint(L, t, lineinfo[i], &v); /* table[line] = true */ + } +} + + +static int auxgetinfo (lua_State *L, const char *what, lua_Debug *ar, + Closure *f, CallInfo *ci) { + int status = 1; + for (; *what; what++) { + switch (*what) { + case 'S': { + funcinfo(ar, f); + break; + } + case 'l': { + ar->currentline = (ci && isLua(ci)) ? currentline(ci) : -1; + break; + } + case 'u': { + ar->nups = (f == NULL) ? 0 : f->c.nupvalues; + if (noLuaClosure(f)) { + ar->isvararg = 1; + ar->nparams = 0; + } + else { + ar->isvararg = f->l.p->is_vararg; + ar->nparams = f->l.p->numparams; + } + break; + } + case 't': { + ar->istailcall = (ci) ? ci->callstatus & CIST_TAIL : 0; + break; + } + case 'n': { + /* calling function is a known Lua function? */ + if (ci && !(ci->callstatus & CIST_TAIL) && isLua(ci->previous)) + ar->namewhat = getfuncname(L, ci->previous, &ar->name); + else + ar->namewhat = NULL; + if (ar->namewhat == NULL) { + ar->namewhat = ""; /* not found */ + ar->name = NULL; + } + break; + } + case 'L': + case 'f': /* handled by lua_getinfo */ + break; + default: status = 0; /* invalid option */ + } + } + return status; +} + + +LUA_API int lua_getinfo (lua_State *L, const char *what, lua_Debug *ar) { + int status; + Closure *cl; + CallInfo *ci; + StkId func; + lua_lock(L); + if (*what == '>') { + ci = NULL; + func = L->top - 1; + api_check(L, ttisfunction(func), "function expected"); + what++; /* skip the '>' */ + L->top--; /* pop function */ + } + else { + ci = ar->i_ci; + func = ci->func; + lua_assert(ttisfunction(ci->func)); + } + cl = ttisclosure(func) ? clvalue(func) : NULL; + status = auxgetinfo(L, what, ar, cl, ci); + if (strchr(what, 'f')) { + setobjs2s(L, L->top, func); + api_incr_top(L); + } + if (strchr(what, 'L')) + collectvalidlines(L, cl); + lua_unlock(L); + return status; +} + + +/* +** {====================================================== +** Symbolic Execution +** ======================================================= +*/ + +static const char *getobjname (Proto *p, int lastpc, int reg, + const char **name); + + +/* +** find a "name" for the RK value 'c' +*/ +static void kname (Proto *p, int pc, int c, const char **name) { + if (ISK(c)) { /* is 'c' a constant? */ + TValue *kvalue = &p->k[INDEXK(c)]; + if (ttisstring(kvalue)) { /* literal constant? */ + *name = svalue(kvalue); /* it is its own name */ + return; + } + /* else no reasonable name found */ + } + else { /* 'c' is a register */ + const char *what = getobjname(p, pc, c, name); /* search for 'c' */ + if (what && *what == 'c') { /* found a constant name? */ + return; /* 'name' already filled */ + } + /* else no reasonable name found */ + } + *name = "?"; /* no reasonable name found */ +} + + +/* +** try to find last instruction before 'lastpc' that modified register 'reg' +*/ +static int findsetreg (Proto *p, int lastpc, int reg) { + int pc; + int setreg = -1; /* keep last instruction that changed 'reg' */ + for (pc = 0; pc < lastpc; pc++) { + Instruction i = p->code[pc]; + OpCode op = GET_OPCODE(i); + int a = GETARG_A(i); + switch (op) { + case OP_LOADNIL: { + int b = GETARG_B(i); + if (a <= reg && reg <= a + b) /* set registers from 'a' to 'a+b' */ + setreg = pc; + break; + } + case OP_TFORCALL: { + if (reg >= a + 2) setreg = pc; /* affect all regs above its base */ + break; + } + case OP_CALL: + case OP_TAILCALL: { + if (reg >= a) setreg = pc; /* affect all registers above base */ + break; + } + case OP_JMP: { + int b = GETARG_sBx(i); + int dest = pc + 1 + b; + /* jump is forward and do not skip `lastpc'? */ + if (pc < dest && dest <= lastpc) + pc += b; /* do the jump */ + break; + } + case OP_TEST: { + if (reg == a) setreg = pc; /* jumped code can change 'a' */ + break; + } + default: + if (testAMode(op) && reg == a) /* any instruction that set A */ + setreg = pc; + break; + } + } + return setreg; +} + + +static const char *getobjname (Proto *p, int lastpc, int reg, + const char **name) { + int pc; + *name = luaF_getlocalname(p, reg + 1, lastpc); + if (*name) /* is a local? */ + return "local"; + /* else try symbolic execution */ + pc = findsetreg(p, lastpc, reg); + if (pc != -1) { /* could find instruction? */ + Instruction i = p->code[pc]; + OpCode op = GET_OPCODE(i); + switch (op) { + case OP_MOVE: { + int b = GETARG_B(i); /* move from 'b' to 'a' */ + if (b < GETARG_A(i)) + return getobjname(p, pc, b, name); /* get name for 'b' */ + break; + } + case OP_GETTABUP: + case OP_GETTABLE: { + int k = GETARG_C(i); /* key index */ + int t = GETARG_B(i); /* table index */ + const char *vn = (op == OP_GETTABLE) /* name of indexed variable */ + ? luaF_getlocalname(p, t + 1, pc) + : upvalname(p, t); + kname(p, pc, k, name); + return (vn && strcmp(vn, LUA_ENV) == 0) ? "global" : "field"; + } + case OP_GETUPVAL: { + *name = upvalname(p, GETARG_B(i)); + return "upvalue"; + } + case OP_LOADK: + case OP_LOADKX: { + int b = (op == OP_LOADK) ? GETARG_Bx(i) + : GETARG_Ax(p->code[pc + 1]); + if (ttisstring(&p->k[b])) { + *name = svalue(&p->k[b]); + return "constant"; + } + break; + } + case OP_SELF: { + int k = GETARG_C(i); /* key index */ + kname(p, pc, k, name); + return "method"; + } + default: break; /* go through to return NULL */ + } + } + return NULL; /* could not find reasonable name */ +} + + +static const char *getfuncname (lua_State *L, CallInfo *ci, const char **name) { + TMS tm; + Proto *p = ci_func(ci)->p; /* calling function */ + int pc = currentpc(ci); /* calling instruction index */ + Instruction i = p->code[pc]; /* calling instruction */ + switch (GET_OPCODE(i)) { + case OP_CALL: + case OP_TAILCALL: /* get function name */ + return getobjname(p, pc, GETARG_A(i), name); + case OP_TFORCALL: { /* for iterator */ + *name = "for iterator"; + return "for iterator"; + } + /* all other instructions can call only through metamethods */ + case OP_SELF: + case OP_GETTABUP: + case OP_GETTABLE: tm = TM_INDEX; break; + case OP_SETTABUP: + case OP_SETTABLE: tm = TM_NEWINDEX; break; + case OP_EQ: tm = TM_EQ; break; + case OP_ADD: tm = TM_ADD; break; + case OP_SUB: tm = TM_SUB; break; + case OP_MUL: tm = TM_MUL; break; + case OP_DIV: tm = TM_DIV; break; + case OP_MOD: tm = TM_MOD; break; + case OP_POW: tm = TM_POW; break; + case OP_UNM: tm = TM_UNM; break; + case OP_LEN: tm = TM_LEN; break; + case OP_LT: tm = TM_LT; break; + case OP_LE: tm = TM_LE; break; + case OP_CONCAT: tm = TM_CONCAT; break; + default: + return NULL; /* else no useful name can be found */ + } + *name = getstr(G(L)->tmname[tm]); + return "metamethod"; +} + +/* }====================================================== */ + + + +/* +** only ANSI way to check whether a pointer points to an array +** (used only for error messages, so efficiency is not a big concern) +*/ +static int isinstack (CallInfo *ci, const TValue *o) { + StkId p; + for (p = ci->u.l.base; p < ci->top; p++) + if (o == p) return 1; + return 0; +} + + +static const char *getupvalname (CallInfo *ci, const TValue *o, + const char **name) { + LClosure *c = ci_func(ci); + int i; + for (i = 0; i < c->nupvalues; i++) { + if (c->upvals[i]->v == o) { + *name = upvalname(c->p, i); + return "upvalue"; + } + } + return NULL; +} + + +l_noret luaG_typeerror (lua_State *L, const TValue *o, const char *op) { + CallInfo *ci = L->ci; + const char *name = NULL; + const char *t = objtypename(o); + const char *kind = NULL; + if (isLua(ci)) { + kind = getupvalname(ci, o, &name); /* check whether 'o' is an upvalue */ + if (!kind && isinstack(ci, o)) /* no? try a register */ + kind = getobjname(ci_func(ci)->p, currentpc(ci), + cast_int(o - ci->u.l.base), &name); + } + if (kind) + luaG_runerror(L, "attempt to %s %s " LUA_QS " (a %s value)", + op, kind, name, t); + else + luaG_runerror(L, "attempt to %s a %s value", op, t); +} + + +l_noret luaG_concaterror (lua_State *L, StkId p1, StkId p2) { + if (ttisstring(p1) || ttisnumber(p1)) p1 = p2; + lua_assert(!ttisstring(p1) && !ttisnumber(p2)); + luaG_typeerror(L, p1, "concatenate"); +} + + +l_noret luaG_aritherror (lua_State *L, const TValue *p1, const TValue *p2) { + TValue temp; + if (luaV_tonumber(p1, &temp) == NULL) + p2 = p1; /* first operand is wrong */ + luaG_typeerror(L, p2, "perform arithmetic on"); +} + + +l_noret luaG_ordererror (lua_State *L, const TValue *p1, const TValue *p2) { + const char *t1 = objtypename(p1); + const char *t2 = objtypename(p2); + if (t1 == t2) + luaG_runerror(L, "attempt to compare two %s values", t1); + else + luaG_runerror(L, "attempt to compare %s with %s", t1, t2); +} + + +static void addinfo (lua_State *L, const char *msg) { + CallInfo *ci = L->ci; + if (isLua(ci)) { /* is Lua code? */ + char buff[LUA_IDSIZE]; /* add file:line information */ + int line = currentline(ci); + TString *src = ci_func(ci)->p->source; + if (src) + luaO_chunkid(buff, getstr(src), LUA_IDSIZE); + else { /* no source available; use "?" instead */ + buff[0] = '?'; buff[1] = '\0'; + } + luaO_pushfstring(L, "%s:%d: %s", buff, line, msg); + } +} + + +l_noret luaG_errormsg (lua_State *L) { + if (L->errfunc != 0) { /* is there an error handling function? */ + StkId errfunc = restorestack(L, L->errfunc); + if (!ttisfunction(errfunc)) luaD_throw(L, LUA_ERRERR); + setobjs2s(L, L->top, L->top - 1); /* move argument */ + setobjs2s(L, L->top - 1, errfunc); /* push function */ + L->top++; + luaD_call(L, L->top - 2, 1, 0); /* call it */ + } + luaD_throw(L, LUA_ERRRUN); +} + + +l_noret luaG_runerror (lua_State *L, const char *fmt, ...) { + va_list argp; + va_start(argp, fmt); + addinfo(L, luaO_pushvfstring(L, fmt, argp)); + va_end(argp); + luaG_errormsg(L); +} + diff --git a/ext/lua/src/ldo.c b/ext/lua/src/ldo.c new file mode 100644 index 000000000..aafa3dca2 --- /dev/null +++ b/ext/lua/src/ldo.c @@ -0,0 +1,673 @@ +/* +** $Id: ldo.c,v 2.108 2012/10/01 14:05:04 roberto Exp $ +** Stack and Call structure of Lua +** See Copyright Notice in lua.h +*/ + + +#include +#include +#include + +#define ldo_c +#define LUA_CORE + +#include "lua.h" + +#include "lapi.h" +#include "ldebug.h" +#include "ldo.h" +#include "lfunc.h" +#include "lgc.h" +#include "lmem.h" +#include "lobject.h" +#include "lopcodes.h" +#include "lparser.h" +#include "lstate.h" +#include "lstring.h" +#include "ltable.h" +#include "ltm.h" +#include "lundump.h" +#include "lvm.h" +#include "lzio.h" + + + + +/* +** {====================================================== +** Error-recovery functions +** ======================================================= +*/ + +/* +** LUAI_THROW/LUAI_TRY define how Lua does exception handling. By +** default, Lua handles errors with exceptions when compiling as +** C++ code, with _longjmp/_setjmp when asked to use them, and with +** longjmp/setjmp otherwise. +*/ +#if !defined(LUAI_THROW) + +#if defined(__cplusplus) && !defined(LUA_USE_LONGJMP) +/* C++ exceptions */ +#define LUAI_THROW(L,c) throw(c) +#define LUAI_TRY(L,c,a) \ + try { a } catch(...) { if ((c)->status == 0) (c)->status = -1; } +#define luai_jmpbuf int /* dummy variable */ + +#elif defined(LUA_USE_ULONGJMP) +/* in Unix, try _longjmp/_setjmp (more efficient) */ +#define LUAI_THROW(L,c) _longjmp((c)->b, 1) +#define LUAI_TRY(L,c,a) if (_setjmp((c)->b) == 0) { a } +#define luai_jmpbuf jmp_buf + +#else +/* default handling with long jumps */ +#define LUAI_THROW(L,c) longjmp((c)->b, 1) +#define LUAI_TRY(L,c,a) if (setjmp((c)->b) == 0) { a } +#define luai_jmpbuf jmp_buf + +#endif + +#endif + + + +/* chain list of long jump buffers */ +struct lua_longjmp { + struct lua_longjmp *previous; + luai_jmpbuf b; + volatile int status; /* error code */ +}; + + +static void seterrorobj (lua_State *L, int errcode, StkId oldtop) { + switch (errcode) { + case LUA_ERRMEM: { /* memory error? */ + setsvalue2s(L, oldtop, G(L)->memerrmsg); /* reuse preregistered msg. */ + break; + } + case LUA_ERRERR: { + setsvalue2s(L, oldtop, luaS_newliteral(L, "error in error handling")); + break; + } + default: { + setobjs2s(L, oldtop, L->top - 1); /* error message on current top */ + break; + } + } + L->top = oldtop + 1; +} + + +l_noret luaD_throw (lua_State *L, int errcode) { + if (L->errorJmp) { /* thread has an error handler? */ + L->errorJmp->status = errcode; /* set status */ + LUAI_THROW(L, L->errorJmp); /* jump to it */ + } + else { /* thread has no error handler */ + L->status = cast_byte(errcode); /* mark it as dead */ + if (G(L)->mainthread->errorJmp) { /* main thread has a handler? */ + setobjs2s(L, G(L)->mainthread->top++, L->top - 1); /* copy error obj. */ + luaD_throw(G(L)->mainthread, errcode); /* re-throw in main thread */ + } + else { /* no handler at all; abort */ + if (G(L)->panic) { /* panic function? */ + lua_unlock(L); + G(L)->panic(L); /* call it (last chance to jump out) */ + } + abort(); + } + } +} + + +int luaD_rawrunprotected (lua_State *L, Pfunc f, void *ud) { + unsigned short oldnCcalls = L->nCcalls; + struct lua_longjmp lj; + lj.status = LUA_OK; + lj.previous = L->errorJmp; /* chain new error handler */ + L->errorJmp = &lj; + LUAI_TRY(L, &lj, + (*f)(L, ud); + ); + L->errorJmp = lj.previous; /* restore old error handler */ + L->nCcalls = oldnCcalls; + return lj.status; +} + +/* }====================================================== */ + + +static void correctstack (lua_State *L, TValue *oldstack) { + CallInfo *ci; + GCObject *up; + L->top = (L->top - oldstack) + L->stack; + for (up = L->openupval; up != NULL; up = up->gch.next) + gco2uv(up)->v = (gco2uv(up)->v - oldstack) + L->stack; + for (ci = L->ci; ci != NULL; ci = ci->previous) { + ci->top = (ci->top - oldstack) + L->stack; + ci->func = (ci->func - oldstack) + L->stack; + if (isLua(ci)) + ci->u.l.base = (ci->u.l.base - oldstack) + L->stack; + } +} + + +/* some space for error handling */ +#define ERRORSTACKSIZE (LUAI_MAXSTACK + 200) + + +void luaD_reallocstack (lua_State *L, int newsize) { + TValue *oldstack = L->stack; + int lim = L->stacksize; + lua_assert(newsize <= LUAI_MAXSTACK || newsize == ERRORSTACKSIZE); + lua_assert(L->stack_last - L->stack == L->stacksize - EXTRA_STACK); + luaM_reallocvector(L, L->stack, L->stacksize, newsize, TValue); + for (; lim < newsize; lim++) + setnilvalue(L->stack + lim); /* erase new segment */ + L->stacksize = newsize; + L->stack_last = L->stack + newsize - EXTRA_STACK; + correctstack(L, oldstack); +} + + +void luaD_growstack (lua_State *L, int n) { + int size = L->stacksize; + if (size > LUAI_MAXSTACK) /* error after extra size? */ + luaD_throw(L, LUA_ERRERR); + else { + int needed = cast_int(L->top - L->stack) + n + EXTRA_STACK; + int newsize = 2 * size; + if (newsize > LUAI_MAXSTACK) newsize = LUAI_MAXSTACK; + if (newsize < needed) newsize = needed; + if (newsize > LUAI_MAXSTACK) { /* stack overflow? */ + luaD_reallocstack(L, ERRORSTACKSIZE); + luaG_runerror(L, "stack overflow"); + } + else + luaD_reallocstack(L, newsize); + } +} + + +static int stackinuse (lua_State *L) { + CallInfo *ci; + StkId lim = L->top; + for (ci = L->ci; ci != NULL; ci = ci->previous) { + lua_assert(ci->top <= L->stack_last); + if (lim < ci->top) lim = ci->top; + } + return cast_int(lim - L->stack) + 1; /* part of stack in use */ +} + + +void luaD_shrinkstack (lua_State *L) { + int inuse = stackinuse(L); + int goodsize = inuse + (inuse / 8) + 2*EXTRA_STACK; + if (goodsize > LUAI_MAXSTACK) goodsize = LUAI_MAXSTACK; + if (inuse > LUAI_MAXSTACK || /* handling stack overflow? */ + goodsize >= L->stacksize) /* would grow instead of shrink? */ + condmovestack(L); /* don't change stack (change only for debugging) */ + else + luaD_reallocstack(L, goodsize); /* shrink it */ +} + + +void luaD_hook (lua_State *L, int event, int line) { + lua_Hook hook = L->hook; + if (hook && L->allowhook) { + CallInfo *ci = L->ci; + ptrdiff_t top = savestack(L, L->top); + ptrdiff_t ci_top = savestack(L, ci->top); + lua_Debug ar; + ar.event = event; + ar.currentline = line; + ar.i_ci = ci; + luaD_checkstack(L, LUA_MINSTACK); /* ensure minimum stack size */ + ci->top = L->top + LUA_MINSTACK; + lua_assert(ci->top <= L->stack_last); + L->allowhook = 0; /* cannot call hooks inside a hook */ + ci->callstatus |= CIST_HOOKED; + lua_unlock(L); + (*hook)(L, &ar); + lua_lock(L); + lua_assert(!L->allowhook); + L->allowhook = 1; + ci->top = restorestack(L, ci_top); + L->top = restorestack(L, top); + ci->callstatus &= ~CIST_HOOKED; + } +} + + +static void callhook (lua_State *L, CallInfo *ci) { + int hook = LUA_HOOKCALL; + ci->u.l.savedpc++; /* hooks assume 'pc' is already incremented */ + if (isLua(ci->previous) && + GET_OPCODE(*(ci->previous->u.l.savedpc - 1)) == OP_TAILCALL) { + ci->callstatus |= CIST_TAIL; + hook = LUA_HOOKTAILCALL; + } + luaD_hook(L, hook, -1); + ci->u.l.savedpc--; /* correct 'pc' */ +} + + +static StkId adjust_varargs (lua_State *L, Proto *p, int actual) { + int i; + int nfixargs = p->numparams; + StkId base, fixed; + lua_assert(actual >= nfixargs); + /* move fixed parameters to final position */ + fixed = L->top - actual; /* first fixed argument */ + base = L->top; /* final position of first argument */ + for (i=0; itop++, fixed + i); + setnilvalue(fixed + i); + } + return base; +} + + +static StkId tryfuncTM (lua_State *L, StkId func) { + const TValue *tm = luaT_gettmbyobj(L, func, TM_CALL); + StkId p; + ptrdiff_t funcr = savestack(L, func); + if (!ttisfunction(tm)) + luaG_typeerror(L, func, "call"); + /* Open a hole inside the stack at `func' */ + for (p = L->top; p > func; p--) setobjs2s(L, p, p-1); + incr_top(L); + func = restorestack(L, funcr); /* previous call may change stack */ + setobj2s(L, func, tm); /* tag method is the new function to be called */ + return func; +} + + + +#define next_ci(L) (L->ci = (L->ci->next ? L->ci->next : luaE_extendCI(L))) + + +/* +** returns true if function has been executed (C function) +*/ +int luaD_precall (lua_State *L, StkId func, int nresults) { + lua_CFunction f; + CallInfo *ci; + int n; /* number of arguments (Lua) or returns (C) */ + ptrdiff_t funcr = savestack(L, func); + switch (ttype(func)) { + case LUA_TLCF: /* light C function */ + f = fvalue(func); + goto Cfunc; + case LUA_TCCL: { /* C closure */ + f = clCvalue(func)->f; + Cfunc: + luaD_checkstack(L, LUA_MINSTACK); /* ensure minimum stack size */ + ci = next_ci(L); /* now 'enter' new function */ + ci->nresults = nresults; + ci->func = restorestack(L, funcr); + ci->top = L->top + LUA_MINSTACK; + lua_assert(ci->top <= L->stack_last); + ci->callstatus = 0; + luaC_checkGC(L); /* stack grow uses memory */ + if (L->hookmask & LUA_MASKCALL) + luaD_hook(L, LUA_HOOKCALL, -1); + lua_unlock(L); + n = (*f)(L); /* do the actual call */ + lua_lock(L); + api_checknelems(L, n); + luaD_poscall(L, L->top - n); + return 1; + } + case LUA_TLCL: { /* Lua function: prepare its call */ + StkId base; + Proto *p = clLvalue(func)->p; + luaD_checkstack(L, p->maxstacksize); + func = restorestack(L, funcr); + n = cast_int(L->top - func) - 1; /* number of real arguments */ + for (; n < p->numparams; n++) + setnilvalue(L->top++); /* complete missing arguments */ + base = (!p->is_vararg) ? func + 1 : adjust_varargs(L, p, n); + ci = next_ci(L); /* now 'enter' new function */ + ci->nresults = nresults; + ci->func = func; + ci->u.l.base = base; + ci->top = base + p->maxstacksize; + lua_assert(ci->top <= L->stack_last); + ci->u.l.savedpc = p->code; /* starting point */ + ci->callstatus = CIST_LUA; + L->top = ci->top; + luaC_checkGC(L); /* stack grow uses memory */ + if (L->hookmask & LUA_MASKCALL) + callhook(L, ci); + return 0; + } + default: { /* not a function */ + func = tryfuncTM(L, func); /* retry with 'function' tag method */ + return luaD_precall(L, func, nresults); /* now it must be a function */ + } + } +} + + +int luaD_poscall (lua_State *L, StkId firstResult) { + StkId res; + int wanted, i; + CallInfo *ci = L->ci; + if (L->hookmask & (LUA_MASKRET | LUA_MASKLINE)) { + if (L->hookmask & LUA_MASKRET) { + ptrdiff_t fr = savestack(L, firstResult); /* hook may change stack */ + luaD_hook(L, LUA_HOOKRET, -1); + firstResult = restorestack(L, fr); + } + L->oldpc = ci->previous->u.l.savedpc; /* 'oldpc' for caller function */ + } + res = ci->func; /* res == final position of 1st result */ + wanted = ci->nresults; + L->ci = ci = ci->previous; /* back to caller */ + /* move results to correct place */ + for (i = wanted; i != 0 && firstResult < L->top; i--) + setobjs2s(L, res++, firstResult++); + while (i-- > 0) + setnilvalue(res++); + L->top = res; + return (wanted - LUA_MULTRET); /* 0 iff wanted == LUA_MULTRET */ +} + + +/* +** Call a function (C or Lua). The function to be called is at *func. +** The arguments are on the stack, right after the function. +** When returns, all the results are on the stack, starting at the original +** function position. +*/ +void luaD_call (lua_State *L, StkId func, int nResults, int allowyield) { + if (++L->nCcalls >= LUAI_MAXCCALLS) { + if (L->nCcalls == LUAI_MAXCCALLS) + luaG_runerror(L, "C stack overflow"); + else if (L->nCcalls >= (LUAI_MAXCCALLS + (LUAI_MAXCCALLS>>3))) + luaD_throw(L, LUA_ERRERR); /* error while handing stack error */ + } + if (!allowyield) L->nny++; + if (!luaD_precall(L, func, nResults)) /* is a Lua function? */ + luaV_execute(L); /* call it */ + if (!allowyield) L->nny--; + L->nCcalls--; +} + + +static void finishCcall (lua_State *L) { + CallInfo *ci = L->ci; + int n; + lua_assert(ci->u.c.k != NULL); /* must have a continuation */ + lua_assert(L->nny == 0); + if (ci->callstatus & CIST_YPCALL) { /* was inside a pcall? */ + ci->callstatus &= ~CIST_YPCALL; /* finish 'lua_pcall' */ + L->errfunc = ci->u.c.old_errfunc; + } + /* finish 'lua_callk'/'lua_pcall' */ + adjustresults(L, ci->nresults); + /* call continuation function */ + if (!(ci->callstatus & CIST_STAT)) /* no call status? */ + ci->u.c.status = LUA_YIELD; /* 'default' status */ + lua_assert(ci->u.c.status != LUA_OK); + ci->callstatus = (ci->callstatus & ~(CIST_YPCALL | CIST_STAT)) | CIST_YIELDED; + lua_unlock(L); + n = (*ci->u.c.k)(L); + lua_lock(L); + api_checknelems(L, n); + /* finish 'luaD_precall' */ + luaD_poscall(L, L->top - n); +} + + +static void unroll (lua_State *L, void *ud) { + UNUSED(ud); + for (;;) { + if (L->ci == &L->base_ci) /* stack is empty? */ + return; /* coroutine finished normally */ + if (!isLua(L->ci)) /* C function? */ + finishCcall(L); + else { /* Lua function */ + luaV_finishOp(L); /* finish interrupted instruction */ + luaV_execute(L); /* execute down to higher C 'boundary' */ + } + } +} + + +/* +** check whether thread has a suspended protected call +*/ +static CallInfo *findpcall (lua_State *L) { + CallInfo *ci; + for (ci = L->ci; ci != NULL; ci = ci->previous) { /* search for a pcall */ + if (ci->callstatus & CIST_YPCALL) + return ci; + } + return NULL; /* no pending pcall */ +} + + +static int recover (lua_State *L, int status) { + StkId oldtop; + CallInfo *ci = findpcall(L); + if (ci == NULL) return 0; /* no recovery point */ + /* "finish" luaD_pcall */ + oldtop = restorestack(L, ci->extra); + luaF_close(L, oldtop); + seterrorobj(L, status, oldtop); + L->ci = ci; + L->allowhook = ci->u.c.old_allowhook; + L->nny = 0; /* should be zero to be yieldable */ + luaD_shrinkstack(L); + L->errfunc = ci->u.c.old_errfunc; + ci->callstatus |= CIST_STAT; /* call has error status */ + ci->u.c.status = status; /* (here it is) */ + return 1; /* continue running the coroutine */ +} + + +/* +** signal an error in the call to 'resume', not in the execution of the +** coroutine itself. (Such errors should not be handled by any coroutine +** error handler and should not kill the coroutine.) +*/ +static l_noret resume_error (lua_State *L, const char *msg, StkId firstArg) { + L->top = firstArg; /* remove args from the stack */ + setsvalue2s(L, L->top, luaS_new(L, msg)); /* push error message */ + api_incr_top(L); + luaD_throw(L, -1); /* jump back to 'lua_resume' */ +} + + +/* +** do the work for 'lua_resume' in protected mode +*/ +static void resume (lua_State *L, void *ud) { + int nCcalls = L->nCcalls; + StkId firstArg = cast(StkId, ud); + CallInfo *ci = L->ci; + if (nCcalls >= LUAI_MAXCCALLS) + resume_error(L, "C stack overflow", firstArg); + if (L->status == LUA_OK) { /* may be starting a coroutine */ + if (ci != &L->base_ci) /* not in base level? */ + resume_error(L, "cannot resume non-suspended coroutine", firstArg); + /* coroutine is in base level; start running it */ + if (!luaD_precall(L, firstArg - 1, LUA_MULTRET)) /* Lua function? */ + luaV_execute(L); /* call it */ + } + else if (L->status != LUA_YIELD) + resume_error(L, "cannot resume dead coroutine", firstArg); + else { /* resuming from previous yield */ + L->status = LUA_OK; + ci->func = restorestack(L, ci->extra); + if (isLua(ci)) /* yielded inside a hook? */ + luaV_execute(L); /* just continue running Lua code */ + else { /* 'common' yield */ + if (ci->u.c.k != NULL) { /* does it have a continuation? */ + int n; + ci->u.c.status = LUA_YIELD; /* 'default' status */ + ci->callstatus |= CIST_YIELDED; + lua_unlock(L); + n = (*ci->u.c.k)(L); /* call continuation */ + lua_lock(L); + api_checknelems(L, n); + firstArg = L->top - n; /* yield results come from continuation */ + } + luaD_poscall(L, firstArg); /* finish 'luaD_precall' */ + } + unroll(L, NULL); + } + lua_assert(nCcalls == L->nCcalls); +} + + +LUA_API int lua_resume (lua_State *L, lua_State *from, int nargs) { + int status; + lua_lock(L); + luai_userstateresume(L, nargs); + L->nCcalls = (from) ? from->nCcalls + 1 : 1; + L->nny = 0; /* allow yields */ + api_checknelems(L, (L->status == LUA_OK) ? nargs + 1 : nargs); + status = luaD_rawrunprotected(L, resume, L->top - nargs); + if (status == -1) /* error calling 'lua_resume'? */ + status = LUA_ERRRUN; + else { /* yield or regular error */ + while (status != LUA_OK && status != LUA_YIELD) { /* error? */ + if (recover(L, status)) /* recover point? */ + status = luaD_rawrunprotected(L, unroll, NULL); /* run continuation */ + else { /* unrecoverable error */ + L->status = cast_byte(status); /* mark thread as `dead' */ + seterrorobj(L, status, L->top); + L->ci->top = L->top; + break; + } + } + lua_assert(status == L->status); + } + L->nny = 1; /* do not allow yields */ + L->nCcalls--; + lua_assert(L->nCcalls == ((from) ? from->nCcalls : 0)); + lua_unlock(L); + return status; +} + + +LUA_API int lua_yieldk (lua_State *L, int nresults, int ctx, lua_CFunction k) { + CallInfo *ci = L->ci; + luai_userstateyield(L, nresults); + lua_lock(L); + api_checknelems(L, nresults); + if (L->nny > 0) { + if (L != G(L)->mainthread) + luaG_runerror(L, "attempt to yield across a C-call boundary"); + else + luaG_runerror(L, "attempt to yield from outside a coroutine"); + } + L->status = LUA_YIELD; + ci->extra = savestack(L, ci->func); /* save current 'func' */ + if (isLua(ci)) { /* inside a hook? */ + api_check(L, k == NULL, "hooks cannot continue after yielding"); + } + else { + if ((ci->u.c.k = k) != NULL) /* is there a continuation? */ + ci->u.c.ctx = ctx; /* save context */ + ci->func = L->top - nresults - 1; /* protect stack below results */ + luaD_throw(L, LUA_YIELD); + } + lua_assert(ci->callstatus & CIST_HOOKED); /* must be inside a hook */ + lua_unlock(L); + return 0; /* return to 'luaD_hook' */ +} + + +int luaD_pcall (lua_State *L, Pfunc func, void *u, + ptrdiff_t old_top, ptrdiff_t ef) { + int status; + CallInfo *old_ci = L->ci; + lu_byte old_allowhooks = L->allowhook; + unsigned short old_nny = L->nny; + ptrdiff_t old_errfunc = L->errfunc; + L->errfunc = ef; + status = luaD_rawrunprotected(L, func, u); + if (status != LUA_OK) { /* an error occurred? */ + StkId oldtop = restorestack(L, old_top); + luaF_close(L, oldtop); /* close possible pending closures */ + seterrorobj(L, status, oldtop); + L->ci = old_ci; + L->allowhook = old_allowhooks; + L->nny = old_nny; + luaD_shrinkstack(L); + } + L->errfunc = old_errfunc; + return status; +} + + + +/* +** Execute a protected parser. +*/ +struct SParser { /* data to `f_parser' */ + ZIO *z; + Mbuffer buff; /* dynamic structure used by the scanner */ + Dyndata dyd; /* dynamic structures used by the parser */ + const char *mode; + const char *name; +}; + + +static void checkmode (lua_State *L, const char *mode, const char *x) { + if (mode && strchr(mode, x[0]) == NULL) { + luaO_pushfstring(L, + "attempt to load a %s chunk (mode is " LUA_QS ")", x, mode); + luaD_throw(L, LUA_ERRSYNTAX); + } +} + + +static void f_parser (lua_State *L, void *ud) { + int i; + Closure *cl; + struct SParser *p = cast(struct SParser *, ud); + int c = zgetc(p->z); /* read first character */ + if (c == LUA_SIGNATURE[0]) { + checkmode(L, p->mode, "binary"); + cl = luaU_undump(L, p->z, &p->buff, p->name); + } + else { + checkmode(L, p->mode, "text"); + cl = luaY_parser(L, p->z, &p->buff, &p->dyd, p->name, c); + } + lua_assert(cl->l.nupvalues == cl->l.p->sizeupvalues); + for (i = 0; i < cl->l.nupvalues; i++) { /* initialize upvalues */ + UpVal *up = luaF_newupval(L); + cl->l.upvals[i] = up; + luaC_objbarrier(L, cl, up); + } +} + + +int luaD_protectedparser (lua_State *L, ZIO *z, const char *name, + const char *mode) { + struct SParser p; + int status; + L->nny++; /* cannot yield during parsing */ + p.z = z; p.name = name; p.mode = mode; + p.dyd.actvar.arr = NULL; p.dyd.actvar.size = 0; + p.dyd.gt.arr = NULL; p.dyd.gt.size = 0; + p.dyd.label.arr = NULL; p.dyd.label.size = 0; + luaZ_initbuffer(L, &p.buff); + status = luaD_pcall(L, f_parser, &p, savestack(L, L->top), L->errfunc); + luaZ_freebuffer(L, &p.buff); + luaM_freearray(L, p.dyd.actvar.arr, p.dyd.actvar.size); + luaM_freearray(L, p.dyd.gt.arr, p.dyd.gt.size); + luaM_freearray(L, p.dyd.label.arr, p.dyd.label.size); + L->nny--; + return status; +} + + diff --git a/ext/lua/src/ldump.c b/ext/lua/src/ldump.c new file mode 100644 index 000000000..d5e6a47cb --- /dev/null +++ b/ext/lua/src/ldump.c @@ -0,0 +1,173 @@ +/* +** $Id: ldump.c,v 2.17 2012/01/23 23:02:10 roberto Exp $ +** save precompiled Lua chunks +** See Copyright Notice in lua.h +*/ + +#include + +#define ldump_c +#define LUA_CORE + +#include "lua.h" + +#include "lobject.h" +#include "lstate.h" +#include "lundump.h" + +typedef struct { + lua_State* L; + lua_Writer writer; + void* data; + int strip; + int status; +} DumpState; + +#define DumpMem(b,n,size,D) DumpBlock(b,(n)*(size),D) +#define DumpVar(x,D) DumpMem(&x,1,sizeof(x),D) + +static void DumpBlock(const void* b, size_t size, DumpState* D) +{ + if (D->status==0) + { + lua_unlock(D->L); + D->status=(*D->writer)(D->L,b,size,D->data); + lua_lock(D->L); + } +} + +static void DumpChar(int y, DumpState* D) +{ + char x=(char)y; + DumpVar(x,D); +} + +static void DumpInt(int x, DumpState* D) +{ + DumpVar(x,D); +} + +static void DumpNumber(lua_Number x, DumpState* D) +{ + DumpVar(x,D); +} + +static void DumpVector(const void* b, int n, size_t size, DumpState* D) +{ + DumpInt(n,D); + DumpMem(b,n,size,D); +} + +static void DumpString(const TString* s, DumpState* D) +{ + if (s==NULL) + { + size_t size=0; + DumpVar(size,D); + } + else + { + size_t size=s->tsv.len+1; /* include trailing '\0' */ + DumpVar(size,D); + DumpBlock(getstr(s),size*sizeof(char),D); + } +} + +#define DumpCode(f,D) DumpVector(f->code,f->sizecode,sizeof(Instruction),D) + +static void DumpFunction(const Proto* f, DumpState* D); + +static void DumpConstants(const Proto* f, DumpState* D) +{ + int i,n=f->sizek; + DumpInt(n,D); + for (i=0; ik[i]; + DumpChar(ttypenv(o),D); + switch (ttypenv(o)) + { + case LUA_TNIL: + break; + case LUA_TBOOLEAN: + DumpChar(bvalue(o),D); + break; + case LUA_TNUMBER: + DumpNumber(nvalue(o),D); + break; + case LUA_TSTRING: + DumpString(rawtsvalue(o),D); + break; + default: lua_assert(0); + } + } + n=f->sizep; + DumpInt(n,D); + for (i=0; ip[i],D); +} + +static void DumpUpvalues(const Proto* f, DumpState* D) +{ + int i,n=f->sizeupvalues; + DumpInt(n,D); + for (i=0; iupvalues[i].instack,D); + DumpChar(f->upvalues[i].idx,D); + } +} + +static void DumpDebug(const Proto* f, DumpState* D) +{ + int i,n; + DumpString((D->strip) ? NULL : f->source,D); + n= (D->strip) ? 0 : f->sizelineinfo; + DumpVector(f->lineinfo,n,sizeof(int),D); + n= (D->strip) ? 0 : f->sizelocvars; + DumpInt(n,D); + for (i=0; ilocvars[i].varname,D); + DumpInt(f->locvars[i].startpc,D); + DumpInt(f->locvars[i].endpc,D); + } + n= (D->strip) ? 0 : f->sizeupvalues; + DumpInt(n,D); + for (i=0; iupvalues[i].name,D); +} + +static void DumpFunction(const Proto* f, DumpState* D) +{ + DumpInt(f->linedefined,D); + DumpInt(f->lastlinedefined,D); + DumpChar(f->numparams,D); + DumpChar(f->is_vararg,D); + DumpChar(f->maxstacksize,D); + DumpCode(f,D); + DumpConstants(f,D); + DumpUpvalues(f,D); + DumpDebug(f,D); +} + +static void DumpHeader(DumpState* D) +{ + lu_byte h[LUAC_HEADERSIZE]; + luaU_header(h); + DumpBlock(h,LUAC_HEADERSIZE,D); +} + +/* +** dump Lua function as precompiled chunk +*/ +int luaU_dump (lua_State* L, const Proto* f, lua_Writer w, void* data, int strip) +{ + DumpState D; + D.L=L; + D.writer=w; + D.data=data; + D.strip=strip; + D.status=0; + DumpHeader(&D); + DumpFunction(f,&D); + return D.status; +} diff --git a/ext/lua/src/lfunc.c b/ext/lua/src/lfunc.c new file mode 100644 index 000000000..c2128405b --- /dev/null +++ b/ext/lua/src/lfunc.c @@ -0,0 +1,161 @@ +/* +** $Id: lfunc.c,v 2.30 2012/10/03 12:36:46 roberto Exp $ +** Auxiliary functions to manipulate prototypes and closures +** See Copyright Notice in lua.h +*/ + + +#include + +#define lfunc_c +#define LUA_CORE + +#include "lua.h" + +#include "lfunc.h" +#include "lgc.h" +#include "lmem.h" +#include "lobject.h" +#include "lstate.h" + + + +Closure *luaF_newCclosure (lua_State *L, int n) { + Closure *c = &luaC_newobj(L, LUA_TCCL, sizeCclosure(n), NULL, 0)->cl; + c->c.nupvalues = cast_byte(n); + return c; +} + + +Closure *luaF_newLclosure (lua_State *L, int n) { + Closure *c = &luaC_newobj(L, LUA_TLCL, sizeLclosure(n), NULL, 0)->cl; + c->l.p = NULL; + c->l.nupvalues = cast_byte(n); + while (n--) c->l.upvals[n] = NULL; + return c; +} + + +UpVal *luaF_newupval (lua_State *L) { + UpVal *uv = &luaC_newobj(L, LUA_TUPVAL, sizeof(UpVal), NULL, 0)->uv; + uv->v = &uv->u.value; + setnilvalue(uv->v); + return uv; +} + + +UpVal *luaF_findupval (lua_State *L, StkId level) { + global_State *g = G(L); + GCObject **pp = &L->openupval; + UpVal *p; + UpVal *uv; + while (*pp != NULL && (p = gco2uv(*pp))->v >= level) { + GCObject *o = obj2gco(p); + lua_assert(p->v != &p->u.value); + lua_assert(!isold(o) || isold(obj2gco(L))); + if (p->v == level) { /* found a corresponding upvalue? */ + if (isdead(g, o)) /* is it dead? */ + changewhite(o); /* resurrect it */ + return p; + } + pp = &p->next; + } + /* not found: create a new one */ + uv = &luaC_newobj(L, LUA_TUPVAL, sizeof(UpVal), pp, 0)->uv; + uv->v = level; /* current value lives in the stack */ + uv->u.l.prev = &g->uvhead; /* double link it in `uvhead' list */ + uv->u.l.next = g->uvhead.u.l.next; + uv->u.l.next->u.l.prev = uv; + g->uvhead.u.l.next = uv; + lua_assert(uv->u.l.next->u.l.prev == uv && uv->u.l.prev->u.l.next == uv); + return uv; +} + + +static void unlinkupval (UpVal *uv) { + lua_assert(uv->u.l.next->u.l.prev == uv && uv->u.l.prev->u.l.next == uv); + uv->u.l.next->u.l.prev = uv->u.l.prev; /* remove from `uvhead' list */ + uv->u.l.prev->u.l.next = uv->u.l.next; +} + + +void luaF_freeupval (lua_State *L, UpVal *uv) { + if (uv->v != &uv->u.value) /* is it open? */ + unlinkupval(uv); /* remove from open list */ + luaM_free(L, uv); /* free upvalue */ +} + + +void luaF_close (lua_State *L, StkId level) { + UpVal *uv; + global_State *g = G(L); + while (L->openupval != NULL && (uv = gco2uv(L->openupval))->v >= level) { + GCObject *o = obj2gco(uv); + lua_assert(!isblack(o) && uv->v != &uv->u.value); + L->openupval = uv->next; /* remove from `open' list */ + if (isdead(g, o)) + luaF_freeupval(L, uv); /* free upvalue */ + else { + unlinkupval(uv); /* remove upvalue from 'uvhead' list */ + setobj(L, &uv->u.value, uv->v); /* move value to upvalue slot */ + uv->v = &uv->u.value; /* now current value lives here */ + gch(o)->next = g->allgc; /* link upvalue into 'allgc' list */ + g->allgc = o; + luaC_checkupvalcolor(g, uv); + } + } +} + + +Proto *luaF_newproto (lua_State *L) { + Proto *f = &luaC_newobj(L, LUA_TPROTO, sizeof(Proto), NULL, 0)->p; + f->k = NULL; + f->sizek = 0; + f->p = NULL; + f->sizep = 0; + f->code = NULL; + f->cache = NULL; + f->sizecode = 0; + f->lineinfo = NULL; + f->sizelineinfo = 0; + f->upvalues = NULL; + f->sizeupvalues = 0; + f->numparams = 0; + f->is_vararg = 0; + f->maxstacksize = 0; + f->locvars = NULL; + f->sizelocvars = 0; + f->linedefined = 0; + f->lastlinedefined = 0; + f->source = NULL; + return f; +} + + +void luaF_freeproto (lua_State *L, Proto *f) { + luaM_freearray(L, f->code, f->sizecode); + luaM_freearray(L, f->p, f->sizep); + luaM_freearray(L, f->k, f->sizek); + luaM_freearray(L, f->lineinfo, f->sizelineinfo); + luaM_freearray(L, f->locvars, f->sizelocvars); + luaM_freearray(L, f->upvalues, f->sizeupvalues); + luaM_free(L, f); +} + + +/* +** Look for n-th local variable at line `line' in function `func'. +** Returns NULL if not found. +*/ +const char *luaF_getlocalname (const Proto *f, int local_number, int pc) { + int i; + for (i = 0; isizelocvars && f->locvars[i].startpc <= pc; i++) { + if (pc < f->locvars[i].endpc) { /* is variable active? */ + local_number--; + if (local_number == 0) + return getstr(f->locvars[i].varname); + } + } + return NULL; /* not found */ +} + diff --git a/ext/lua/src/lgc.c b/ext/lua/src/lgc.c new file mode 100644 index 000000000..535e988ae --- /dev/null +++ b/ext/lua/src/lgc.c @@ -0,0 +1,1213 @@ +/* +** $Id: lgc.c,v 2.140 2013/03/16 21:10:18 roberto Exp $ +** Garbage Collector +** See Copyright Notice in lua.h +*/ + +#include + +#define lgc_c +#define LUA_CORE + +#include "lua.h" + +#include "ldebug.h" +#include "ldo.h" +#include "lfunc.h" +#include "lgc.h" +#include "lmem.h" +#include "lobject.h" +#include "lstate.h" +#include "lstring.h" +#include "ltable.h" +#include "ltm.h" + + + +/* +** cost of sweeping one element (the size of a small object divided +** by some adjust for the sweep speed) +*/ +#define GCSWEEPCOST ((sizeof(TString) + 4) / 4) + +/* maximum number of elements to sweep in each single step */ +#define GCSWEEPMAX (cast_int((GCSTEPSIZE / GCSWEEPCOST) / 4)) + +/* maximum number of finalizers to call in each GC step */ +#define GCFINALIZENUM 4 + + +/* +** macro to adjust 'stepmul': 'stepmul' is actually used like +** 'stepmul / STEPMULADJ' (value chosen by tests) +*/ +#define STEPMULADJ 200 + + +/* +** macro to adjust 'pause': 'pause' is actually used like +** 'pause / PAUSEADJ' (value chosen by tests) +*/ +#define PAUSEADJ 100 + + +/* +** 'makewhite' erases all color bits plus the old bit and then +** sets only the current white bit +*/ +#define maskcolors (~(bit2mask(BLACKBIT, OLDBIT) | WHITEBITS)) +#define makewhite(g,x) \ + (gch(x)->marked = cast_byte((gch(x)->marked & maskcolors) | luaC_white(g))) + +#define white2gray(x) resetbits(gch(x)->marked, WHITEBITS) +#define black2gray(x) resetbit(gch(x)->marked, BLACKBIT) + + +#define isfinalized(x) testbit(gch(x)->marked, FINALIZEDBIT) + +#define checkdeadkey(n) lua_assert(!ttisdeadkey(gkey(n)) || ttisnil(gval(n))) + + +#define checkconsistency(obj) \ + lua_longassert(!iscollectable(obj) || righttt(obj)) + + +#define markvalue(g,o) { checkconsistency(o); \ + if (valiswhite(o)) reallymarkobject(g,gcvalue(o)); } + +#define markobject(g,t) { if ((t) && iswhite(obj2gco(t))) \ + reallymarkobject(g, obj2gco(t)); } + +static void reallymarkobject (global_State *g, GCObject *o); + + +/* +** {====================================================== +** Generic functions +** ======================================================= +*/ + + +/* +** one after last element in a hash array +*/ +#define gnodelast(h) gnode(h, cast(size_t, sizenode(h))) + + +/* +** link table 'h' into list pointed by 'p' +*/ +#define linktable(h,p) ((h)->gclist = *(p), *(p) = obj2gco(h)) + + +/* +** if key is not marked, mark its entry as dead (therefore removing it +** from the table) +*/ +static void removeentry (Node *n) { + lua_assert(ttisnil(gval(n))); + if (valiswhite(gkey(n))) + setdeadvalue(gkey(n)); /* unused and unmarked key; remove it */ +} + + +/* +** tells whether a key or value can be cleared from a weak +** table. Non-collectable objects are never removed from weak +** tables. Strings behave as `values', so are never removed too. for +** other objects: if really collected, cannot keep them; for objects +** being finalized, keep them in keys, but not in values +*/ +static int iscleared (global_State *g, const TValue *o) { + if (!iscollectable(o)) return 0; + else if (ttisstring(o)) { + markobject(g, rawtsvalue(o)); /* strings are `values', so are never weak */ + return 0; + } + else return iswhite(gcvalue(o)); +} + + +/* +** barrier that moves collector forward, that is, mark the white object +** being pointed by a black object. +*/ +void luaC_barrier_ (lua_State *L, GCObject *o, GCObject *v) { + global_State *g = G(L); + lua_assert(isblack(o) && iswhite(v) && !isdead(g, v) && !isdead(g, o)); + lua_assert(g->gcstate != GCSpause); + lua_assert(gch(o)->tt != LUA_TTABLE); + if (keepinvariantout(g)) /* must keep invariant? */ + reallymarkobject(g, v); /* restore invariant */ + else { /* sweep phase */ + lua_assert(issweepphase(g)); + makewhite(g, o); /* mark main obj. as white to avoid other barriers */ + } +} + + +/* +** barrier that moves collector backward, that is, mark the black object +** pointing to a white object as gray again. (Current implementation +** only works for tables; access to 'gclist' is not uniform across +** different types.) +*/ +void luaC_barrierback_ (lua_State *L, GCObject *o) { + global_State *g = G(L); + lua_assert(isblack(o) && !isdead(g, o) && gch(o)->tt == LUA_TTABLE); + black2gray(o); /* make object gray (again) */ + gco2t(o)->gclist = g->grayagain; + g->grayagain = o; +} + + +/* +** barrier for prototypes. When creating first closure (cache is +** NULL), use a forward barrier; this may be the only closure of the +** prototype (if it is a "regular" function, with a single instance) +** and the prototype may be big, so it is better to avoid traversing +** it again. Otherwise, use a backward barrier, to avoid marking all +** possible instances. +*/ +LUAI_FUNC void luaC_barrierproto_ (lua_State *L, Proto *p, Closure *c) { + global_State *g = G(L); + lua_assert(isblack(obj2gco(p))); + if (p->cache == NULL) { /* first time? */ + luaC_objbarrier(L, p, c); + } + else { /* use a backward barrier */ + black2gray(obj2gco(p)); /* make prototype gray (again) */ + p->gclist = g->grayagain; + g->grayagain = obj2gco(p); + } +} + + +/* +** check color (and invariants) for an upvalue that was closed, +** i.e., moved into the 'allgc' list +*/ +void luaC_checkupvalcolor (global_State *g, UpVal *uv) { + GCObject *o = obj2gco(uv); + lua_assert(!isblack(o)); /* open upvalues are never black */ + if (isgray(o)) { + if (keepinvariant(g)) { + resetoldbit(o); /* see MOVE OLD rule */ + gray2black(o); /* it is being visited now */ + markvalue(g, uv->v); + } + else { + lua_assert(issweepphase(g)); + makewhite(g, o); + } + } +} + + +/* +** create a new collectable object (with given type and size) and link +** it to '*list'. 'offset' tells how many bytes to allocate before the +** object itself (used only by states). +*/ +GCObject *luaC_newobj (lua_State *L, int tt, size_t sz, GCObject **list, + int offset) { + global_State *g = G(L); + char *raw = cast(char *, luaM_newobject(L, novariant(tt), sz)); + GCObject *o = obj2gco(raw + offset); + if (list == NULL) + list = &g->allgc; /* standard list for collectable objects */ + gch(o)->marked = luaC_white(g); + gch(o)->tt = tt; + gch(o)->next = *list; + *list = o; + return o; +} + +/* }====================================================== */ + + + +/* +** {====================================================== +** Mark functions +** ======================================================= +*/ + + +/* +** mark an object. Userdata, strings, and closed upvalues are visited +** and turned black here. Other objects are marked gray and added +** to appropriate list to be visited (and turned black) later. (Open +** upvalues are already linked in 'headuv' list.) +*/ +static void reallymarkobject (global_State *g, GCObject *o) { + lu_mem size; + white2gray(o); + switch (gch(o)->tt) { + case LUA_TSHRSTR: + case LUA_TLNGSTR: { + size = sizestring(gco2ts(o)); + break; /* nothing else to mark; make it black */ + } + case LUA_TUSERDATA: { + Table *mt = gco2u(o)->metatable; + markobject(g, mt); + markobject(g, gco2u(o)->env); + size = sizeudata(gco2u(o)); + break; + } + case LUA_TUPVAL: { + UpVal *uv = gco2uv(o); + markvalue(g, uv->v); + if (uv->v != &uv->u.value) /* open? */ + return; /* open upvalues remain gray */ + size = sizeof(UpVal); + break; + } + case LUA_TLCL: { + gco2lcl(o)->gclist = g->gray; + g->gray = o; + return; + } + case LUA_TCCL: { + gco2ccl(o)->gclist = g->gray; + g->gray = o; + return; + } + case LUA_TTABLE: { + linktable(gco2t(o), &g->gray); + return; + } + case LUA_TTHREAD: { + gco2th(o)->gclist = g->gray; + g->gray = o; + return; + } + case LUA_TPROTO: { + gco2p(o)->gclist = g->gray; + g->gray = o; + return; + } + default: lua_assert(0); return; + } + gray2black(o); + g->GCmemtrav += size; +} + + +/* +** mark metamethods for basic types +*/ +static void markmt (global_State *g) { + int i; + for (i=0; i < LUA_NUMTAGS; i++) + markobject(g, g->mt[i]); +} + + +/* +** mark all objects in list of being-finalized +*/ +static void markbeingfnz (global_State *g) { + GCObject *o; + for (o = g->tobefnz; o != NULL; o = gch(o)->next) { + makewhite(g, o); + reallymarkobject(g, o); + } +} + + +/* +** mark all values stored in marked open upvalues. (See comment in +** 'lstate.h'.) +*/ +static void remarkupvals (global_State *g) { + UpVal *uv; + for (uv = g->uvhead.u.l.next; uv != &g->uvhead; uv = uv->u.l.next) { + if (isgray(obj2gco(uv))) + markvalue(g, uv->v); + } +} + + +/* +** mark root set and reset all gray lists, to start a new +** incremental (or full) collection +*/ +static void restartcollection (global_State *g) { + g->gray = g->grayagain = NULL; + g->weak = g->allweak = g->ephemeron = NULL; + markobject(g, g->mainthread); + markvalue(g, &g->l_registry); + markmt(g); + markbeingfnz(g); /* mark any finalizing object left from previous cycle */ +} + +/* }====================================================== */ + + +/* +** {====================================================== +** Traverse functions +** ======================================================= +*/ + +static void traverseweakvalue (global_State *g, Table *h) { + Node *n, *limit = gnodelast(h); + /* if there is array part, assume it may have white values (do not + traverse it just to check) */ + int hasclears = (h->sizearray > 0); + for (n = gnode(h, 0); n < limit; n++) { + checkdeadkey(n); + if (ttisnil(gval(n))) /* entry is empty? */ + removeentry(n); /* remove it */ + else { + lua_assert(!ttisnil(gkey(n))); + markvalue(g, gkey(n)); /* mark key */ + if (!hasclears && iscleared(g, gval(n))) /* is there a white value? */ + hasclears = 1; /* table will have to be cleared */ + } + } + if (hasclears) + linktable(h, &g->weak); /* has to be cleared later */ + else /* no white values */ + linktable(h, &g->grayagain); /* no need to clean */ +} + + +static int traverseephemeron (global_State *g, Table *h) { + int marked = 0; /* true if an object is marked in this traversal */ + int hasclears = 0; /* true if table has white keys */ + int prop = 0; /* true if table has entry "white-key -> white-value" */ + Node *n, *limit = gnodelast(h); + int i; + /* traverse array part (numeric keys are 'strong') */ + for (i = 0; i < h->sizearray; i++) { + if (valiswhite(&h->array[i])) { + marked = 1; + reallymarkobject(g, gcvalue(&h->array[i])); + } + } + /* traverse hash part */ + for (n = gnode(h, 0); n < limit; n++) { + checkdeadkey(n); + if (ttisnil(gval(n))) /* entry is empty? */ + removeentry(n); /* remove it */ + else if (iscleared(g, gkey(n))) { /* key is not marked (yet)? */ + hasclears = 1; /* table must be cleared */ + if (valiswhite(gval(n))) /* value not marked yet? */ + prop = 1; /* must propagate again */ + } + else if (valiswhite(gval(n))) { /* value not marked yet? */ + marked = 1; + reallymarkobject(g, gcvalue(gval(n))); /* mark it now */ + } + } + if (prop) + linktable(h, &g->ephemeron); /* have to propagate again */ + else if (hasclears) /* does table have white keys? */ + linktable(h, &g->allweak); /* may have to clean white keys */ + else /* no white keys */ + linktable(h, &g->grayagain); /* no need to clean */ + return marked; +} + + +static void traversestrongtable (global_State *g, Table *h) { + Node *n, *limit = gnodelast(h); + int i; + for (i = 0; i < h->sizearray; i++) /* traverse array part */ + markvalue(g, &h->array[i]); + for (n = gnode(h, 0); n < limit; n++) { /* traverse hash part */ + checkdeadkey(n); + if (ttisnil(gval(n))) /* entry is empty? */ + removeentry(n); /* remove it */ + else { + lua_assert(!ttisnil(gkey(n))); + markvalue(g, gkey(n)); /* mark key */ + markvalue(g, gval(n)); /* mark value */ + } + } +} + + +static lu_mem traversetable (global_State *g, Table *h) { + const char *weakkey, *weakvalue; + const TValue *mode = gfasttm(g, h->metatable, TM_MODE); + markobject(g, h->metatable); + if (mode && ttisstring(mode) && /* is there a weak mode? */ + ((weakkey = strchr(svalue(mode), 'k')), + (weakvalue = strchr(svalue(mode), 'v')), + (weakkey || weakvalue))) { /* is really weak? */ + black2gray(obj2gco(h)); /* keep table gray */ + if (!weakkey) /* strong keys? */ + traverseweakvalue(g, h); + else if (!weakvalue) /* strong values? */ + traverseephemeron(g, h); + else /* all weak */ + linktable(h, &g->allweak); /* nothing to traverse now */ + } + else /* not weak */ + traversestrongtable(g, h); + return sizeof(Table) + sizeof(TValue) * h->sizearray + + sizeof(Node) * cast(size_t, sizenode(h)); +} + + +static int traverseproto (global_State *g, Proto *f) { + int i; + if (f->cache && iswhite(obj2gco(f->cache))) + f->cache = NULL; /* allow cache to be collected */ + markobject(g, f->source); + for (i = 0; i < f->sizek; i++) /* mark literals */ + markvalue(g, &f->k[i]); + for (i = 0; i < f->sizeupvalues; i++) /* mark upvalue names */ + markobject(g, f->upvalues[i].name); + for (i = 0; i < f->sizep; i++) /* mark nested protos */ + markobject(g, f->p[i]); + for (i = 0; i < f->sizelocvars; i++) /* mark local-variable names */ + markobject(g, f->locvars[i].varname); + return sizeof(Proto) + sizeof(Instruction) * f->sizecode + + sizeof(Proto *) * f->sizep + + sizeof(TValue) * f->sizek + + sizeof(int) * f->sizelineinfo + + sizeof(LocVar) * f->sizelocvars + + sizeof(Upvaldesc) * f->sizeupvalues; +} + + +static lu_mem traverseCclosure (global_State *g, CClosure *cl) { + int i; + for (i = 0; i < cl->nupvalues; i++) /* mark its upvalues */ + markvalue(g, &cl->upvalue[i]); + return sizeCclosure(cl->nupvalues); +} + +static lu_mem traverseLclosure (global_State *g, LClosure *cl) { + int i; + markobject(g, cl->p); /* mark its prototype */ + for (i = 0; i < cl->nupvalues; i++) /* mark its upvalues */ + markobject(g, cl->upvals[i]); + return sizeLclosure(cl->nupvalues); +} + + +static lu_mem traversestack (global_State *g, lua_State *th) { + StkId o = th->stack; + if (o == NULL) + return 1; /* stack not completely built yet */ + for (; o < th->top; o++) + markvalue(g, o); + if (g->gcstate == GCSatomic) { /* final traversal? */ + StkId lim = th->stack + th->stacksize; /* real end of stack */ + for (; o < lim; o++) /* clear not-marked stack slice */ + setnilvalue(o); + } + return sizeof(lua_State) + sizeof(TValue) * th->stacksize; +} + + +/* +** traverse one gray object, turning it to black (except for threads, +** which are always gray). +*/ +static void propagatemark (global_State *g) { + lu_mem size; + GCObject *o = g->gray; + lua_assert(isgray(o)); + gray2black(o); + switch (gch(o)->tt) { + case LUA_TTABLE: { + Table *h = gco2t(o); + g->gray = h->gclist; /* remove from 'gray' list */ + size = traversetable(g, h); + break; + } + case LUA_TLCL: { + LClosure *cl = gco2lcl(o); + g->gray = cl->gclist; /* remove from 'gray' list */ + size = traverseLclosure(g, cl); + break; + } + case LUA_TCCL: { + CClosure *cl = gco2ccl(o); + g->gray = cl->gclist; /* remove from 'gray' list */ + size = traverseCclosure(g, cl); + break; + } + case LUA_TTHREAD: { + lua_State *th = gco2th(o); + g->gray = th->gclist; /* remove from 'gray' list */ + th->gclist = g->grayagain; + g->grayagain = o; /* insert into 'grayagain' list */ + black2gray(o); + size = traversestack(g, th); + break; + } + case LUA_TPROTO: { + Proto *p = gco2p(o); + g->gray = p->gclist; /* remove from 'gray' list */ + size = traverseproto(g, p); + break; + } + default: lua_assert(0); return; + } + g->GCmemtrav += size; +} + + +static void propagateall (global_State *g) { + while (g->gray) propagatemark(g); +} + + +static void propagatelist (global_State *g, GCObject *l) { + lua_assert(g->gray == NULL); /* no grays left */ + g->gray = l; + propagateall(g); /* traverse all elements from 'l' */ +} + +/* +** retraverse all gray lists. Because tables may be reinserted in other +** lists when traversed, traverse the original lists to avoid traversing +** twice the same table (which is not wrong, but inefficient) +*/ +static void retraversegrays (global_State *g) { + GCObject *weak = g->weak; /* save original lists */ + GCObject *grayagain = g->grayagain; + GCObject *ephemeron = g->ephemeron; + g->weak = g->grayagain = g->ephemeron = NULL; + propagateall(g); /* traverse main gray list */ + propagatelist(g, grayagain); + propagatelist(g, weak); + propagatelist(g, ephemeron); +} + + +static void convergeephemerons (global_State *g) { + int changed; + do { + GCObject *w; + GCObject *next = g->ephemeron; /* get ephemeron list */ + g->ephemeron = NULL; /* tables will return to this list when traversed */ + changed = 0; + while ((w = next) != NULL) { + next = gco2t(w)->gclist; + if (traverseephemeron(g, gco2t(w))) { /* traverse marked some value? */ + propagateall(g); /* propagate changes */ + changed = 1; /* will have to revisit all ephemeron tables */ + } + } + } while (changed); +} + +/* }====================================================== */ + + +/* +** {====================================================== +** Sweep Functions +** ======================================================= +*/ + + +/* +** clear entries with unmarked keys from all weaktables in list 'l' up +** to element 'f' +*/ +static void clearkeys (global_State *g, GCObject *l, GCObject *f) { + for (; l != f; l = gco2t(l)->gclist) { + Table *h = gco2t(l); + Node *n, *limit = gnodelast(h); + for (n = gnode(h, 0); n < limit; n++) { + if (!ttisnil(gval(n)) && (iscleared(g, gkey(n)))) { + setnilvalue(gval(n)); /* remove value ... */ + removeentry(n); /* and remove entry from table */ + } + } + } +} + + +/* +** clear entries with unmarked values from all weaktables in list 'l' up +** to element 'f' +*/ +static void clearvalues (global_State *g, GCObject *l, GCObject *f) { + for (; l != f; l = gco2t(l)->gclist) { + Table *h = gco2t(l); + Node *n, *limit = gnodelast(h); + int i; + for (i = 0; i < h->sizearray; i++) { + TValue *o = &h->array[i]; + if (iscleared(g, o)) /* value was collected? */ + setnilvalue(o); /* remove value */ + } + for (n = gnode(h, 0); n < limit; n++) { + if (!ttisnil(gval(n)) && iscleared(g, gval(n))) { + setnilvalue(gval(n)); /* remove value ... */ + removeentry(n); /* and remove entry from table */ + } + } + } +} + + +static void freeobj (lua_State *L, GCObject *o) { + switch (gch(o)->tt) { + case LUA_TPROTO: luaF_freeproto(L, gco2p(o)); break; + case LUA_TLCL: { + luaM_freemem(L, o, sizeLclosure(gco2lcl(o)->nupvalues)); + break; + } + case LUA_TCCL: { + luaM_freemem(L, o, sizeCclosure(gco2ccl(o)->nupvalues)); + break; + } + case LUA_TUPVAL: luaF_freeupval(L, gco2uv(o)); break; + case LUA_TTABLE: luaH_free(L, gco2t(o)); break; + case LUA_TTHREAD: luaE_freethread(L, gco2th(o)); break; + case LUA_TUSERDATA: luaM_freemem(L, o, sizeudata(gco2u(o))); break; + case LUA_TSHRSTR: + G(L)->strt.nuse--; + /* go through */ + case LUA_TLNGSTR: { + luaM_freemem(L, o, sizestring(gco2ts(o))); + break; + } + default: lua_assert(0); + } +} + + +#define sweepwholelist(L,p) sweeplist(L,p,MAX_LUMEM) +static GCObject **sweeplist (lua_State *L, GCObject **p, lu_mem count); + + +/* +** sweep the (open) upvalues of a thread and resize its stack and +** list of call-info structures. +*/ +static void sweepthread (lua_State *L, lua_State *L1) { + if (L1->stack == NULL) return; /* stack not completely built yet */ + sweepwholelist(L, &L1->openupval); /* sweep open upvalues */ + luaE_freeCI(L1); /* free extra CallInfo slots */ + /* should not change the stack during an emergency gc cycle */ + if (G(L)->gckind != KGC_EMERGENCY) + luaD_shrinkstack(L1); +} + + +/* +** sweep at most 'count' elements from a list of GCObjects erasing dead +** objects, where a dead (not alive) object is one marked with the "old" +** (non current) white and not fixed. +** In non-generational mode, change all non-dead objects back to white, +** preparing for next collection cycle. +** In generational mode, keep black objects black, and also mark them as +** old; stop when hitting an old object, as all objects after that +** one will be old too. +** When object is a thread, sweep its list of open upvalues too. +*/ +static GCObject **sweeplist (lua_State *L, GCObject **p, lu_mem count) { + global_State *g = G(L); + int ow = otherwhite(g); + int toclear, toset; /* bits to clear and to set in all live objects */ + int tostop; /* stop sweep when this is true */ + if (isgenerational(g)) { /* generational mode? */ + toclear = ~0; /* clear nothing */ + toset = bitmask(OLDBIT); /* set the old bit of all surviving objects */ + tostop = bitmask(OLDBIT); /* do not sweep old generation */ + } + else { /* normal mode */ + toclear = maskcolors; /* clear all color bits + old bit */ + toset = luaC_white(g); /* make object white */ + tostop = 0; /* do not stop */ + } + while (*p != NULL && count-- > 0) { + GCObject *curr = *p; + int marked = gch(curr)->marked; + if (isdeadm(ow, marked)) { /* is 'curr' dead? */ + *p = gch(curr)->next; /* remove 'curr' from list */ + freeobj(L, curr); /* erase 'curr' */ + } + else { + if (testbits(marked, tostop)) + return NULL; /* stop sweeping this list */ + if (gch(curr)->tt == LUA_TTHREAD) + sweepthread(L, gco2th(curr)); /* sweep thread's upvalues */ + /* update marks */ + gch(curr)->marked = cast_byte((marked & toclear) | toset); + p = &gch(curr)->next; /* go to next element */ + } + } + return (*p == NULL) ? NULL : p; +} + + +/* +** sweep a list until a live object (or end of list) +*/ +static GCObject **sweeptolive (lua_State *L, GCObject **p, int *n) { + GCObject ** old = p; + int i = 0; + do { + i++; + p = sweeplist(L, p, 1); + } while (p == old); + if (n) *n += i; + return p; +} + +/* }====================================================== */ + + +/* +** {====================================================== +** Finalization +** ======================================================= +*/ + +static void checkSizes (lua_State *L) { + global_State *g = G(L); + if (g->gckind != KGC_EMERGENCY) { /* do not change sizes in emergency */ + int hs = g->strt.size / 2; /* half the size of the string table */ + if (g->strt.nuse < cast(lu_int32, hs)) /* using less than that half? */ + luaS_resize(L, hs); /* halve its size */ + luaZ_freebuffer(L, &g->buff); /* free concatenation buffer */ + } +} + + +static GCObject *udata2finalize (global_State *g) { + GCObject *o = g->tobefnz; /* get first element */ + lua_assert(isfinalized(o)); + g->tobefnz = gch(o)->next; /* remove it from 'tobefnz' list */ + gch(o)->next = g->allgc; /* return it to 'allgc' list */ + g->allgc = o; + resetbit(gch(o)->marked, SEPARATED); /* mark that it is not in 'tobefnz' */ + lua_assert(!isold(o)); /* see MOVE OLD rule */ + if (!keepinvariantout(g)) /* not keeping invariant? */ + makewhite(g, o); /* "sweep" object */ + return o; +} + + +static void dothecall (lua_State *L, void *ud) { + UNUSED(ud); + luaD_call(L, L->top - 2, 0, 0); +} + + +static void GCTM (lua_State *L, int propagateerrors) { + global_State *g = G(L); + const TValue *tm; + TValue v; + setgcovalue(L, &v, udata2finalize(g)); + tm = luaT_gettmbyobj(L, &v, TM_GC); + if (tm != NULL && ttisfunction(tm)) { /* is there a finalizer? */ + int status; + lu_byte oldah = L->allowhook; + int running = g->gcrunning; + L->allowhook = 0; /* stop debug hooks during GC metamethod */ + g->gcrunning = 0; /* avoid GC steps */ + setobj2s(L, L->top, tm); /* push finalizer... */ + setobj2s(L, L->top + 1, &v); /* ... and its argument */ + L->top += 2; /* and (next line) call the finalizer */ + status = luaD_pcall(L, dothecall, NULL, savestack(L, L->top - 2), 0); + L->allowhook = oldah; /* restore hooks */ + g->gcrunning = running; /* restore state */ + if (status != LUA_OK && propagateerrors) { /* error while running __gc? */ + if (status == LUA_ERRRUN) { /* is there an error object? */ + const char *msg = (ttisstring(L->top - 1)) + ? svalue(L->top - 1) + : "no message"; + luaO_pushfstring(L, "error in __gc metamethod (%s)", msg); + status = LUA_ERRGCMM; /* error in __gc metamethod */ + } + luaD_throw(L, status); /* re-throw error */ + } + } +} + + +/* +** move all unreachable objects (or 'all' objects) that need +** finalization from list 'finobj' to list 'tobefnz' (to be finalized) +*/ +static void separatetobefnz (lua_State *L, int all) { + global_State *g = G(L); + GCObject **p = &g->finobj; + GCObject *curr; + GCObject **lastnext = &g->tobefnz; + /* find last 'next' field in 'tobefnz' list (to add elements in its end) */ + while (*lastnext != NULL) + lastnext = &gch(*lastnext)->next; + while ((curr = *p) != NULL) { /* traverse all finalizable objects */ + lua_assert(!isfinalized(curr)); + lua_assert(testbit(gch(curr)->marked, SEPARATED)); + if (!(iswhite(curr) || all)) /* not being collected? */ + p = &gch(curr)->next; /* don't bother with it */ + else { + l_setbit(gch(curr)->marked, FINALIZEDBIT); /* won't be finalized again */ + *p = gch(curr)->next; /* remove 'curr' from 'finobj' list */ + gch(curr)->next = *lastnext; /* link at the end of 'tobefnz' list */ + *lastnext = curr; + lastnext = &gch(curr)->next; + } + } +} + + +/* +** if object 'o' has a finalizer, remove it from 'allgc' list (must +** search the list to find it) and link it in 'finobj' list. +*/ +void luaC_checkfinalizer (lua_State *L, GCObject *o, Table *mt) { + global_State *g = G(L); + if (testbit(gch(o)->marked, SEPARATED) || /* obj. is already separated... */ + isfinalized(o) || /* ... or is finalized... */ + gfasttm(g, mt, TM_GC) == NULL) /* or has no finalizer? */ + return; /* nothing to be done */ + else { /* move 'o' to 'finobj' list */ + GCObject **p; + GCheader *ho = gch(o); + if (g->sweepgc == &ho->next) { /* avoid removing current sweep object */ + lua_assert(issweepphase(g)); + g->sweepgc = sweeptolive(L, g->sweepgc, NULL); + } + /* search for pointer pointing to 'o' */ + for (p = &g->allgc; *p != o; p = &gch(*p)->next) { /* empty */ } + *p = ho->next; /* remove 'o' from root list */ + ho->next = g->finobj; /* link it in list 'finobj' */ + g->finobj = o; + l_setbit(ho->marked, SEPARATED); /* mark it as such */ + if (!keepinvariantout(g)) /* not keeping invariant? */ + makewhite(g, o); /* "sweep" object */ + else + resetoldbit(o); /* see MOVE OLD rule */ + } +} + +/* }====================================================== */ + + +/* +** {====================================================== +** GC control +** ======================================================= +*/ + + +/* +** set a reasonable "time" to wait before starting a new GC cycle; +** cycle will start when memory use hits threshold +*/ +static void setpause (global_State *g, l_mem estimate) { + l_mem debt, threshold; + estimate = estimate / PAUSEADJ; /* adjust 'estimate' */ + threshold = (g->gcpause < MAX_LMEM / estimate) /* overflow? */ + ? estimate * g->gcpause /* no overflow */ + : MAX_LMEM; /* overflow; truncate to maximum */ + debt = -cast(l_mem, threshold - gettotalbytes(g)); + luaE_setdebt(g, debt); +} + + +#define sweepphases \ + (bitmask(GCSsweepstring) | bitmask(GCSsweepudata) | bitmask(GCSsweep)) + + +/* +** enter first sweep phase (strings) and prepare pointers for other +** sweep phases. The calls to 'sweeptolive' make pointers point to an +** object inside the list (instead of to the header), so that the real +** sweep do not need to skip objects created between "now" and the start +** of the real sweep. +** Returns how many objects it swept. +*/ +static int entersweep (lua_State *L) { + global_State *g = G(L); + int n = 0; + g->gcstate = GCSsweepstring; + lua_assert(g->sweepgc == NULL && g->sweepfin == NULL); + /* prepare to sweep strings, finalizable objects, and regular objects */ + g->sweepstrgc = 0; + g->sweepfin = sweeptolive(L, &g->finobj, &n); + g->sweepgc = sweeptolive(L, &g->allgc, &n); + return n; +} + + +/* +** change GC mode +*/ +void luaC_changemode (lua_State *L, int mode) { + global_State *g = G(L); + if (mode == g->gckind) return; /* nothing to change */ + if (mode == KGC_GEN) { /* change to generational mode */ + /* make sure gray lists are consistent */ + luaC_runtilstate(L, bitmask(GCSpropagate)); + g->GCestimate = gettotalbytes(g); + g->gckind = KGC_GEN; + } + else { /* change to incremental mode */ + /* sweep all objects to turn them back to white + (as white has not changed, nothing extra will be collected) */ + g->gckind = KGC_NORMAL; + entersweep(L); + luaC_runtilstate(L, ~sweepphases); + } +} + + +/* +** call all pending finalizers +*/ +static void callallpendingfinalizers (lua_State *L, int propagateerrors) { + global_State *g = G(L); + while (g->tobefnz) { + resetoldbit(g->tobefnz); + GCTM(L, propagateerrors); + } +} + + +void luaC_freeallobjects (lua_State *L) { + global_State *g = G(L); + int i; + separatetobefnz(L, 1); /* separate all objects with finalizers */ + lua_assert(g->finobj == NULL); + callallpendingfinalizers(L, 0); + g->currentwhite = WHITEBITS; /* this "white" makes all objects look dead */ + g->gckind = KGC_NORMAL; + sweepwholelist(L, &g->finobj); /* finalizers can create objs. in 'finobj' */ + sweepwholelist(L, &g->allgc); + for (i = 0; i < g->strt.size; i++) /* free all string lists */ + sweepwholelist(L, &g->strt.hash[i]); + lua_assert(g->strt.nuse == 0); +} + + +static l_mem atomic (lua_State *L) { + global_State *g = G(L); + l_mem work = -cast(l_mem, g->GCmemtrav); /* start counting work */ + GCObject *origweak, *origall; + lua_assert(!iswhite(obj2gco(g->mainthread))); + markobject(g, L); /* mark running thread */ + /* registry and global metatables may be changed by API */ + markvalue(g, &g->l_registry); + markmt(g); /* mark basic metatables */ + /* remark occasional upvalues of (maybe) dead threads */ + remarkupvals(g); + propagateall(g); /* propagate changes */ + work += g->GCmemtrav; /* stop counting (do not (re)count grays) */ + /* traverse objects caught by write barrier and by 'remarkupvals' */ + retraversegrays(g); + work -= g->GCmemtrav; /* restart counting */ + convergeephemerons(g); + /* at this point, all strongly accessible objects are marked. */ + /* clear values from weak tables, before checking finalizers */ + clearvalues(g, g->weak, NULL); + clearvalues(g, g->allweak, NULL); + origweak = g->weak; origall = g->allweak; + work += g->GCmemtrav; /* stop counting (objects being finalized) */ + separatetobefnz(L, 0); /* separate objects to be finalized */ + markbeingfnz(g); /* mark objects that will be finalized */ + propagateall(g); /* remark, to propagate `preserveness' */ + work -= g->GCmemtrav; /* restart counting */ + convergeephemerons(g); + /* at this point, all resurrected objects are marked. */ + /* remove dead objects from weak tables */ + clearkeys(g, g->ephemeron, NULL); /* clear keys from all ephemeron tables */ + clearkeys(g, g->allweak, NULL); /* clear keys from all allweak tables */ + /* clear values from resurrected weak tables */ + clearvalues(g, g->weak, origweak); + clearvalues(g, g->allweak, origall); + g->currentwhite = cast_byte(otherwhite(g)); /* flip current white */ + work += g->GCmemtrav; /* complete counting */ + return work; /* estimate of memory marked by 'atomic' */ +} + + +static lu_mem singlestep (lua_State *L) { + global_State *g = G(L); + switch (g->gcstate) { + case GCSpause: { + /* start to count memory traversed */ + g->GCmemtrav = g->strt.size * sizeof(GCObject*); + lua_assert(!isgenerational(g)); + restartcollection(g); + g->gcstate = GCSpropagate; + return g->GCmemtrav; + } + case GCSpropagate: { + if (g->gray) { + lu_mem oldtrav = g->GCmemtrav; + propagatemark(g); + return g->GCmemtrav - oldtrav; /* memory traversed in this step */ + } + else { /* no more `gray' objects */ + lu_mem work; + int sw; + g->gcstate = GCSatomic; /* finish mark phase */ + g->GCestimate = g->GCmemtrav; /* save what was counted */; + work = atomic(L); /* add what was traversed by 'atomic' */ + g->GCestimate += work; /* estimate of total memory traversed */ + sw = entersweep(L); + return work + sw * GCSWEEPCOST; + } + } + case GCSsweepstring: { + int i; + for (i = 0; i < GCSWEEPMAX && g->sweepstrgc + i < g->strt.size; i++) + sweepwholelist(L, &g->strt.hash[g->sweepstrgc + i]); + g->sweepstrgc += i; + if (g->sweepstrgc >= g->strt.size) /* no more strings to sweep? */ + g->gcstate = GCSsweepudata; + return i * GCSWEEPCOST; + } + case GCSsweepudata: { + if (g->sweepfin) { + g->sweepfin = sweeplist(L, g->sweepfin, GCSWEEPMAX); + return GCSWEEPMAX*GCSWEEPCOST; + } + else { + g->gcstate = GCSsweep; + return 0; + } + } + case GCSsweep: { + if (g->sweepgc) { + g->sweepgc = sweeplist(L, g->sweepgc, GCSWEEPMAX); + return GCSWEEPMAX*GCSWEEPCOST; + } + else { + /* sweep main thread */ + GCObject *mt = obj2gco(g->mainthread); + sweeplist(L, &mt, 1); + checkSizes(L); + g->gcstate = GCSpause; /* finish collection */ + return GCSWEEPCOST; + } + } + default: lua_assert(0); return 0; + } +} + + +/* +** advances the garbage collector until it reaches a state allowed +** by 'statemask' +*/ +void luaC_runtilstate (lua_State *L, int statesmask) { + global_State *g = G(L); + while (!testbit(statesmask, g->gcstate)) + singlestep(L); +} + + +static void generationalcollection (lua_State *L) { + global_State *g = G(L); + lua_assert(g->gcstate == GCSpropagate); + if (g->GCestimate == 0) { /* signal for another major collection? */ + luaC_fullgc(L, 0); /* perform a full regular collection */ + g->GCestimate = gettotalbytes(g); /* update control */ + } + else { + lu_mem estimate = g->GCestimate; + luaC_runtilstate(L, bitmask(GCSpause)); /* run complete (minor) cycle */ + g->gcstate = GCSpropagate; /* skip restart */ + if (gettotalbytes(g) > (estimate / 100) * g->gcmajorinc) + g->GCestimate = 0; /* signal for a major collection */ + else + g->GCestimate = estimate; /* keep estimate from last major coll. */ + + } + setpause(g, gettotalbytes(g)); + lua_assert(g->gcstate == GCSpropagate); +} + + +static void incstep (lua_State *L) { + global_State *g = G(L); + l_mem debt = g->GCdebt; + int stepmul = g->gcstepmul; + if (stepmul < 40) stepmul = 40; /* avoid ridiculous low values (and 0) */ + /* convert debt from Kb to 'work units' (avoid zero debt and overflows) */ + debt = (debt / STEPMULADJ) + 1; + debt = (debt < MAX_LMEM / stepmul) ? debt * stepmul : MAX_LMEM; + do { /* always perform at least one single step */ + lu_mem work = singlestep(L); /* do some work */ + debt -= work; + } while (debt > -GCSTEPSIZE && g->gcstate != GCSpause); + if (g->gcstate == GCSpause) + setpause(g, g->GCestimate); /* pause until next cycle */ + else { + debt = (debt / stepmul) * STEPMULADJ; /* convert 'work units' to Kb */ + luaE_setdebt(g, debt); + } +} + + +/* +** performs a basic GC step +*/ +void luaC_forcestep (lua_State *L) { + global_State *g = G(L); + int i; + if (isgenerational(g)) generationalcollection(L); + else incstep(L); + /* run a few finalizers (or all of them at the end of a collect cycle) */ + for (i = 0; g->tobefnz && (i < GCFINALIZENUM || g->gcstate == GCSpause); i++) + GCTM(L, 1); /* call one finalizer */ +} + + +/* +** performs a basic GC step only if collector is running +*/ +void luaC_step (lua_State *L) { + global_State *g = G(L); + if (g->gcrunning) luaC_forcestep(L); + else luaE_setdebt(g, -GCSTEPSIZE); /* avoid being called too often */ +} + + + +/* +** performs a full GC cycle; if "isemergency", does not call +** finalizers (which could change stack positions) +*/ +void luaC_fullgc (lua_State *L, int isemergency) { + global_State *g = G(L); + int origkind = g->gckind; + lua_assert(origkind != KGC_EMERGENCY); + if (isemergency) /* do not run finalizers during emergency GC */ + g->gckind = KGC_EMERGENCY; + else { + g->gckind = KGC_NORMAL; + callallpendingfinalizers(L, 1); + } + if (keepinvariant(g)) { /* may there be some black objects? */ + /* must sweep all objects to turn them back to white + (as white has not changed, nothing will be collected) */ + entersweep(L); + } + /* finish any pending sweep phase to start a new cycle */ + luaC_runtilstate(L, bitmask(GCSpause)); + luaC_runtilstate(L, ~bitmask(GCSpause)); /* start new collection */ + luaC_runtilstate(L, bitmask(GCSpause)); /* run entire collection */ + if (origkind == KGC_GEN) { /* generational mode? */ + /* generational mode must be kept in propagate phase */ + luaC_runtilstate(L, bitmask(GCSpropagate)); + } + g->gckind = origkind; + setpause(g, gettotalbytes(g)); + if (!isemergency) /* do not run finalizers during emergency GC */ + callallpendingfinalizers(L, 1); +} + +/* }====================================================== */ + + diff --git a/ext/lua/src/linit.c b/ext/lua/src/linit.c new file mode 100644 index 000000000..8d3aa6576 --- /dev/null +++ b/ext/lua/src/linit.c @@ -0,0 +1,67 @@ +/* +** $Id: linit.c,v 1.32 2011/04/08 19:17:36 roberto Exp $ +** Initialization of libraries for lua.c and other clients +** See Copyright Notice in lua.h +*/ + + +/* +** If you embed Lua in your program and need to open the standard +** libraries, call luaL_openlibs in your program. If you need a +** different set of libraries, copy this file to your project and edit +** it to suit your needs. +*/ + + +#define linit_c +#define LUA_LIB + +#include "lua.h" + +#include "lualib.h" +#include "lauxlib.h" + + +/* +** these libs are loaded by lua.c and are readily available to any Lua +** program +*/ +static const luaL_Reg loadedlibs[] = { + {"_G", luaopen_base}, + {LUA_LOADLIBNAME, luaopen_package}, + {LUA_COLIBNAME, luaopen_coroutine}, + {LUA_TABLIBNAME, luaopen_table}, + {LUA_IOLIBNAME, luaopen_io}, + {LUA_OSLIBNAME, luaopen_os}, + {LUA_STRLIBNAME, luaopen_string}, + {LUA_BITLIBNAME, luaopen_bit32}, + {LUA_MATHLIBNAME, luaopen_math}, + {LUA_DBLIBNAME, luaopen_debug}, + {NULL, NULL} +}; + + +/* +** these libs are preloaded and must be required before used +*/ +static const luaL_Reg preloadedlibs[] = { + {NULL, NULL} +}; + + +LUALIB_API void luaL_openlibs (lua_State *L) { + const luaL_Reg *lib; + /* call open functions from 'loadedlibs' and set results to global table */ + for (lib = loadedlibs; lib->func; lib++) { + luaL_requiref(L, lib->name, lib->func, 1); + lua_pop(L, 1); /* remove lib */ + } + /* add open functions from 'preloadedlibs' into 'package.preload' table */ + luaL_getsubtable(L, LUA_REGISTRYINDEX, "_PRELOAD"); + for (lib = preloadedlibs; lib->func; lib++) { + lua_pushcfunction(L, lib->func); + lua_setfield(L, -2, lib->name); + } + lua_pop(L, 1); /* remove _PRELOAD table */ +} + diff --git a/ext/lua/src/liolib.c b/ext/lua/src/liolib.c new file mode 100644 index 000000000..3f80db192 --- /dev/null +++ b/ext/lua/src/liolib.c @@ -0,0 +1,665 @@ +/* +** $Id: liolib.c,v 2.111 2013/03/21 13:57:27 roberto Exp $ +** Standard I/O (and system) library +** See Copyright Notice in lua.h +*/ + + +/* +** POSIX idiosyncrasy! +** This definition must come before the inclusion of 'stdio.h'; it +** should not affect non-POSIX systems +*/ +#if !defined(_FILE_OFFSET_BITS) +#define _FILE_OFFSET_BITS 64 +#endif + + +#include +#include +#include +#include + +#define liolib_c +#define LUA_LIB + +#include "lua.h" + +#include "lauxlib.h" +#include "lualib.h" + + +#if !defined(lua_checkmode) + +/* +** Check whether 'mode' matches '[rwa]%+?b?'. +** Change this macro to accept other modes for 'fopen' besides +** the standard ones. +*/ +#define lua_checkmode(mode) \ + (*mode != '\0' && strchr("rwa", *(mode++)) != NULL && \ + (*mode != '+' || ++mode) && /* skip if char is '+' */ \ + (*mode != 'b' || ++mode) && /* skip if char is 'b' */ \ + (*mode == '\0')) + +#endif + +/* +** {====================================================== +** lua_popen spawns a new process connected to the current +** one through the file streams. +** ======================================================= +*/ + +#if !defined(lua_popen) /* { */ + +#if defined(LUA_USE_POPEN) /* { */ + +#define lua_popen(L,c,m) ((void)L, fflush(NULL), popen(c,m)) +#define lua_pclose(L,file) ((void)L, pclose(file)) + +#elif defined(LUA_WIN) /* }{ */ + +#define lua_popen(L,c,m) ((void)L, _popen(c,m)) +#define lua_pclose(L,file) ((void)L, _pclose(file)) + + +#else /* }{ */ + +#define lua_popen(L,c,m) ((void)((void)c, m), \ + luaL_error(L, LUA_QL("popen") " not supported"), (FILE*)0) +#define lua_pclose(L,file) ((void)((void)L, file), -1) + + +#endif /* } */ + +#endif /* } */ + +/* }====================================================== */ + + +/* +** {====================================================== +** lua_fseek/lua_ftell: configuration for longer offsets +** ======================================================= +*/ + +#if !defined(lua_fseek) /* { */ + +#if defined(LUA_USE_POSIX) + +#define l_fseek(f,o,w) fseeko(f,o,w) +#define l_ftell(f) ftello(f) +#define l_seeknum off_t + +#elif defined(LUA_WIN) && !defined(_CRTIMP_TYPEINFO) \ + && defined(_MSC_VER) && (_MSC_VER >= 1400) +/* Windows (but not DDK) and Visual C++ 2005 or higher */ + +#define l_fseek(f,o,w) _fseeki64(f,o,w) +#define l_ftell(f) _ftelli64(f) +#define l_seeknum __int64 + +#else + +#define l_fseek(f,o,w) fseek(f,o,w) +#define l_ftell(f) ftell(f) +#define l_seeknum long + +#endif + +#endif /* } */ + +/* }====================================================== */ + + +#define IO_PREFIX "_IO_" +#define IO_INPUT (IO_PREFIX "input") +#define IO_OUTPUT (IO_PREFIX "output") + + +typedef luaL_Stream LStream; + + +#define tolstream(L) ((LStream *)luaL_checkudata(L, 1, LUA_FILEHANDLE)) + +#define isclosed(p) ((p)->closef == NULL) + + +static int io_type (lua_State *L) { + LStream *p; + luaL_checkany(L, 1); + p = (LStream *)luaL_testudata(L, 1, LUA_FILEHANDLE); + if (p == NULL) + lua_pushnil(L); /* not a file */ + else if (isclosed(p)) + lua_pushliteral(L, "closed file"); + else + lua_pushliteral(L, "file"); + return 1; +} + + +static int f_tostring (lua_State *L) { + LStream *p = tolstream(L); + if (isclosed(p)) + lua_pushliteral(L, "file (closed)"); + else + lua_pushfstring(L, "file (%p)", p->f); + return 1; +} + + +static FILE *tofile (lua_State *L) { + LStream *p = tolstream(L); + if (isclosed(p)) + luaL_error(L, "attempt to use a closed file"); + lua_assert(p->f); + return p->f; +} + + +/* +** When creating file handles, always creates a `closed' file handle +** before opening the actual file; so, if there is a memory error, the +** file is not left opened. +*/ +static LStream *newprefile (lua_State *L) { + LStream *p = (LStream *)lua_newuserdata(L, sizeof(LStream)); + p->closef = NULL; /* mark file handle as 'closed' */ + luaL_setmetatable(L, LUA_FILEHANDLE); + return p; +} + + +static int aux_close (lua_State *L) { + LStream *p = tolstream(L); + lua_CFunction cf = p->closef; + p->closef = NULL; /* mark stream as closed */ + return (*cf)(L); /* close it */ +} + + +static int io_close (lua_State *L) { + if (lua_isnone(L, 1)) /* no argument? */ + lua_getfield(L, LUA_REGISTRYINDEX, IO_OUTPUT); /* use standard output */ + tofile(L); /* make sure argument is an open stream */ + return aux_close(L); +} + + +static int f_gc (lua_State *L) { + LStream *p = tolstream(L); + if (!isclosed(p) && p->f != NULL) + aux_close(L); /* ignore closed and incompletely open files */ + return 0; +} + + +/* +** function to close regular files +*/ +static int io_fclose (lua_State *L) { + LStream *p = tolstream(L); + int res = fclose(p->f); + return luaL_fileresult(L, (res == 0), NULL); +} + + +static LStream *newfile (lua_State *L) { + LStream *p = newprefile(L); + p->f = NULL; + p->closef = &io_fclose; + return p; +} + + +static void opencheck (lua_State *L, const char *fname, const char *mode) { + LStream *p = newfile(L); + p->f = fopen(fname, mode); + if (p->f == NULL) + luaL_error(L, "cannot open file " LUA_QS " (%s)", fname, strerror(errno)); +} + + +static int io_open (lua_State *L) { + const char *filename = luaL_checkstring(L, 1); + const char *mode = luaL_optstring(L, 2, "r"); + LStream *p = newfile(L); + const char *md = mode; /* to traverse/check mode */ + luaL_argcheck(L, lua_checkmode(md), 2, "invalid mode"); + p->f = fopen(filename, mode); + return (p->f == NULL) ? luaL_fileresult(L, 0, filename) : 1; +} + + +/* +** function to close 'popen' files +*/ +static int io_pclose (lua_State *L) { + LStream *p = tolstream(L); + return luaL_execresult(L, lua_pclose(L, p->f)); +} + + +static int io_popen (lua_State *L) { + const char *filename = luaL_checkstring(L, 1); + const char *mode = luaL_optstring(L, 2, "r"); + LStream *p = newprefile(L); + p->f = lua_popen(L, filename, mode); + p->closef = &io_pclose; + return (p->f == NULL) ? luaL_fileresult(L, 0, filename) : 1; +} + + +static int io_tmpfile (lua_State *L) { + LStream *p = newfile(L); + p->f = tmpfile(); + return (p->f == NULL) ? luaL_fileresult(L, 0, NULL) : 1; +} + + +static FILE *getiofile (lua_State *L, const char *findex) { + LStream *p; + lua_getfield(L, LUA_REGISTRYINDEX, findex); + p = (LStream *)lua_touserdata(L, -1); + if (isclosed(p)) + luaL_error(L, "standard %s file is closed", findex + strlen(IO_PREFIX)); + return p->f; +} + + +static int g_iofile (lua_State *L, const char *f, const char *mode) { + if (!lua_isnoneornil(L, 1)) { + const char *filename = lua_tostring(L, 1); + if (filename) + opencheck(L, filename, mode); + else { + tofile(L); /* check that it's a valid file handle */ + lua_pushvalue(L, 1); + } + lua_setfield(L, LUA_REGISTRYINDEX, f); + } + /* return current value */ + lua_getfield(L, LUA_REGISTRYINDEX, f); + return 1; +} + + +static int io_input (lua_State *L) { + return g_iofile(L, IO_INPUT, "r"); +} + + +static int io_output (lua_State *L) { + return g_iofile(L, IO_OUTPUT, "w"); +} + + +static int io_readline (lua_State *L); + + +static void aux_lines (lua_State *L, int toclose) { + int i; + int n = lua_gettop(L) - 1; /* number of arguments to read */ + /* ensure that arguments will fit here and into 'io_readline' stack */ + luaL_argcheck(L, n <= LUA_MINSTACK - 3, LUA_MINSTACK - 3, "too many options"); + lua_pushvalue(L, 1); /* file handle */ + lua_pushinteger(L, n); /* number of arguments to read */ + lua_pushboolean(L, toclose); /* close/not close file when finished */ + for (i = 1; i <= n; i++) lua_pushvalue(L, i + 1); /* copy arguments */ + lua_pushcclosure(L, io_readline, 3 + n); +} + + +static int f_lines (lua_State *L) { + tofile(L); /* check that it's a valid file handle */ + aux_lines(L, 0); + return 1; +} + + +static int io_lines (lua_State *L) { + int toclose; + if (lua_isnone(L, 1)) lua_pushnil(L); /* at least one argument */ + if (lua_isnil(L, 1)) { /* no file name? */ + lua_getfield(L, LUA_REGISTRYINDEX, IO_INPUT); /* get default input */ + lua_replace(L, 1); /* put it at index 1 */ + tofile(L); /* check that it's a valid file handle */ + toclose = 0; /* do not close it after iteration */ + } + else { /* open a new file */ + const char *filename = luaL_checkstring(L, 1); + opencheck(L, filename, "r"); + lua_replace(L, 1); /* put file at index 1 */ + toclose = 1; /* close it after iteration */ + } + aux_lines(L, toclose); + return 1; +} + + +/* +** {====================================================== +** READ +** ======================================================= +*/ + + +static int read_number (lua_State *L, FILE *f) { + lua_Number d; + if (fscanf(f, LUA_NUMBER_SCAN, &d) == 1) { + lua_pushnumber(L, d); + return 1; + } + else { + lua_pushnil(L); /* "result" to be removed */ + return 0; /* read fails */ + } +} + + +static int test_eof (lua_State *L, FILE *f) { + int c = getc(f); + ungetc(c, f); + lua_pushlstring(L, NULL, 0); + return (c != EOF); +} + + +static int read_line (lua_State *L, FILE *f, int chop) { + luaL_Buffer b; + luaL_buffinit(L, &b); + for (;;) { + size_t l; + char *p = luaL_prepbuffer(&b); + if (fgets(p, LUAL_BUFFERSIZE, f) == NULL) { /* eof? */ + luaL_pushresult(&b); /* close buffer */ + return (lua_rawlen(L, -1) > 0); /* check whether read something */ + } + l = strlen(p); + if (l == 0 || p[l-1] != '\n') + luaL_addsize(&b, l); + else { + luaL_addsize(&b, l - chop); /* chop 'eol' if needed */ + luaL_pushresult(&b); /* close buffer */ + return 1; /* read at least an `eol' */ + } + } +} + + +#define MAX_SIZE_T (~(size_t)0) + +static void read_all (lua_State *L, FILE *f) { + size_t rlen = LUAL_BUFFERSIZE; /* how much to read in each cycle */ + luaL_Buffer b; + luaL_buffinit(L, &b); + for (;;) { + char *p = luaL_prepbuffsize(&b, rlen); + size_t nr = fread(p, sizeof(char), rlen, f); + luaL_addsize(&b, nr); + if (nr < rlen) break; /* eof? */ + else if (rlen <= (MAX_SIZE_T / 4)) /* avoid buffers too large */ + rlen *= 2; /* double buffer size at each iteration */ + } + luaL_pushresult(&b); /* close buffer */ +} + + +static int read_chars (lua_State *L, FILE *f, size_t n) { + size_t nr; /* number of chars actually read */ + char *p; + luaL_Buffer b; + luaL_buffinit(L, &b); + p = luaL_prepbuffsize(&b, n); /* prepare buffer to read whole block */ + nr = fread(p, sizeof(char), n, f); /* try to read 'n' chars */ + luaL_addsize(&b, nr); + luaL_pushresult(&b); /* close buffer */ + return (nr > 0); /* true iff read something */ +} + + +static int g_read (lua_State *L, FILE *f, int first) { + int nargs = lua_gettop(L) - 1; + int success; + int n; + clearerr(f); + if (nargs == 0) { /* no arguments? */ + success = read_line(L, f, 1); + n = first+1; /* to return 1 result */ + } + else { /* ensure stack space for all results and for auxlib's buffer */ + luaL_checkstack(L, nargs+LUA_MINSTACK, "too many arguments"); + success = 1; + for (n = first; nargs-- && success; n++) { + if (lua_type(L, n) == LUA_TNUMBER) { + size_t l = (size_t)lua_tointeger(L, n); + success = (l == 0) ? test_eof(L, f) : read_chars(L, f, l); + } + else { + const char *p = lua_tostring(L, n); + luaL_argcheck(L, p && p[0] == '*', n, "invalid option"); + switch (p[1]) { + case 'n': /* number */ + success = read_number(L, f); + break; + case 'l': /* line */ + success = read_line(L, f, 1); + break; + case 'L': /* line with end-of-line */ + success = read_line(L, f, 0); + break; + case 'a': /* file */ + read_all(L, f); /* read entire file */ + success = 1; /* always success */ + break; + default: + return luaL_argerror(L, n, "invalid format"); + } + } + } + } + if (ferror(f)) + return luaL_fileresult(L, 0, NULL); + if (!success) { + lua_pop(L, 1); /* remove last result */ + lua_pushnil(L); /* push nil instead */ + } + return n - first; +} + + +static int io_read (lua_State *L) { + return g_read(L, getiofile(L, IO_INPUT), 1); +} + + +static int f_read (lua_State *L) { + return g_read(L, tofile(L), 2); +} + + +static int io_readline (lua_State *L) { + LStream *p = (LStream *)lua_touserdata(L, lua_upvalueindex(1)); + int i; + int n = (int)lua_tointeger(L, lua_upvalueindex(2)); + if (isclosed(p)) /* file is already closed? */ + return luaL_error(L, "file is already closed"); + lua_settop(L , 1); + for (i = 1; i <= n; i++) /* push arguments to 'g_read' */ + lua_pushvalue(L, lua_upvalueindex(3 + i)); + n = g_read(L, p->f, 2); /* 'n' is number of results */ + lua_assert(n > 0); /* should return at least a nil */ + if (!lua_isnil(L, -n)) /* read at least one value? */ + return n; /* return them */ + else { /* first result is nil: EOF or error */ + if (n > 1) { /* is there error information? */ + /* 2nd result is error message */ + return luaL_error(L, "%s", lua_tostring(L, -n + 1)); + } + if (lua_toboolean(L, lua_upvalueindex(3))) { /* generator created file? */ + lua_settop(L, 0); + lua_pushvalue(L, lua_upvalueindex(1)); + aux_close(L); /* close it */ + } + return 0; + } +} + +/* }====================================================== */ + + +static int g_write (lua_State *L, FILE *f, int arg) { + int nargs = lua_gettop(L) - arg; + int status = 1; + for (; nargs--; arg++) { + if (lua_type(L, arg) == LUA_TNUMBER) { + /* optimization: could be done exactly as for strings */ + status = status && + fprintf(f, LUA_NUMBER_FMT, lua_tonumber(L, arg)) > 0; + } + else { + size_t l; + const char *s = luaL_checklstring(L, arg, &l); + status = status && (fwrite(s, sizeof(char), l, f) == l); + } + } + if (status) return 1; /* file handle already on stack top */ + else return luaL_fileresult(L, status, NULL); +} + + +static int io_write (lua_State *L) { + return g_write(L, getiofile(L, IO_OUTPUT), 1); +} + + +static int f_write (lua_State *L) { + FILE *f = tofile(L); + lua_pushvalue(L, 1); /* push file at the stack top (to be returned) */ + return g_write(L, f, 2); +} + + +static int f_seek (lua_State *L) { + static const int mode[] = {SEEK_SET, SEEK_CUR, SEEK_END}; + static const char *const modenames[] = {"set", "cur", "end", NULL}; + FILE *f = tofile(L); + int op = luaL_checkoption(L, 2, "cur", modenames); + lua_Number p3 = luaL_optnumber(L, 3, 0); + l_seeknum offset = (l_seeknum)p3; + luaL_argcheck(L, (lua_Number)offset == p3, 3, + "not an integer in proper range"); + op = l_fseek(f, offset, mode[op]); + if (op) + return luaL_fileresult(L, 0, NULL); /* error */ + else { + lua_pushnumber(L, (lua_Number)l_ftell(f)); + return 1; + } +} + + +static int f_setvbuf (lua_State *L) { + static const int mode[] = {_IONBF, _IOFBF, _IOLBF}; + static const char *const modenames[] = {"no", "full", "line", NULL}; + FILE *f = tofile(L); + int op = luaL_checkoption(L, 2, NULL, modenames); + lua_Integer sz = luaL_optinteger(L, 3, LUAL_BUFFERSIZE); + int res = setvbuf(f, NULL, mode[op], sz); + return luaL_fileresult(L, res == 0, NULL); +} + + + +static int io_flush (lua_State *L) { + return luaL_fileresult(L, fflush(getiofile(L, IO_OUTPUT)) == 0, NULL); +} + + +static int f_flush (lua_State *L) { + return luaL_fileresult(L, fflush(tofile(L)) == 0, NULL); +} + + +/* +** functions for 'io' library +*/ +static const luaL_Reg iolib[] = { + {"close", io_close}, + {"flush", io_flush}, + {"input", io_input}, + {"lines", io_lines}, + {"open", io_open}, + {"output", io_output}, + {"popen", io_popen}, + {"read", io_read}, + {"tmpfile", io_tmpfile}, + {"type", io_type}, + {"write", io_write}, + {NULL, NULL} +}; + + +/* +** methods for file handles +*/ +static const luaL_Reg flib[] = { + {"close", io_close}, + {"flush", f_flush}, + {"lines", f_lines}, + {"read", f_read}, + {"seek", f_seek}, + {"setvbuf", f_setvbuf}, + {"write", f_write}, + {"__gc", f_gc}, + {"__tostring", f_tostring}, + {NULL, NULL} +}; + + +static void createmeta (lua_State *L) { + luaL_newmetatable(L, LUA_FILEHANDLE); /* create metatable for file handles */ + lua_pushvalue(L, -1); /* push metatable */ + lua_setfield(L, -2, "__index"); /* metatable.__index = metatable */ + luaL_setfuncs(L, flib, 0); /* add file methods to new metatable */ + lua_pop(L, 1); /* pop new metatable */ +} + + +/* +** function to (not) close the standard files stdin, stdout, and stderr +*/ +static int io_noclose (lua_State *L) { + LStream *p = tolstream(L); + p->closef = &io_noclose; /* keep file opened */ + lua_pushnil(L); + lua_pushliteral(L, "cannot close standard file"); + return 2; +} + + +static void createstdfile (lua_State *L, FILE *f, const char *k, + const char *fname) { + LStream *p = newprefile(L); + p->f = f; + p->closef = &io_noclose; + if (k != NULL) { + lua_pushvalue(L, -1); + lua_setfield(L, LUA_REGISTRYINDEX, k); /* add file to registry */ + } + lua_setfield(L, -2, fname); /* add file to module */ +} + + +LUAMOD_API int luaopen_io (lua_State *L) { + luaL_newlib(L, iolib); /* new module */ + createmeta(L); + /* create (and set) default files */ + createstdfile(L, stdin, IO_INPUT, "stdin"); + createstdfile(L, stdout, IO_OUTPUT, "stdout"); + createstdfile(L, stderr, NULL, "stderr"); + return 1; +} + diff --git a/ext/lua/src/llex.c b/ext/lua/src/llex.c new file mode 100644 index 000000000..1a32e348b --- /dev/null +++ b/ext/lua/src/llex.c @@ -0,0 +1,527 @@ +/* +** $Id: llex.c,v 2.63 2013/03/16 21:10:18 roberto Exp $ +** Lexical Analyzer +** See Copyright Notice in lua.h +*/ + + +#include +#include + +#define llex_c +#define LUA_CORE + +#include "lua.h" + +#include "lctype.h" +#include "ldo.h" +#include "llex.h" +#include "lobject.h" +#include "lparser.h" +#include "lstate.h" +#include "lstring.h" +#include "ltable.h" +#include "lzio.h" + + + +#define next(ls) (ls->current = zgetc(ls->z)) + + + +#define currIsNewline(ls) (ls->current == '\n' || ls->current == '\r') + + +/* ORDER RESERVED */ +static const char *const luaX_tokens [] = { + "and", "break", "do", "else", "elseif", + "end", "false", "for", "function", "goto", "if", + "in", "local", "nil", "not", "or", "repeat", + "return", "then", "true", "until", "while", + "..", "...", "==", ">=", "<=", "~=", "::", "", + "", "", "" +}; + + +#define save_and_next(ls) (save(ls, ls->current), next(ls)) + + +static l_noret lexerror (LexState *ls, const char *msg, int token); + + +static void save (LexState *ls, int c) { + Mbuffer *b = ls->buff; + if (luaZ_bufflen(b) + 1 > luaZ_sizebuffer(b)) { + size_t newsize; + if (luaZ_sizebuffer(b) >= MAX_SIZET/2) + lexerror(ls, "lexical element too long", 0); + newsize = luaZ_sizebuffer(b) * 2; + luaZ_resizebuffer(ls->L, b, newsize); + } + b->buffer[luaZ_bufflen(b)++] = cast(char, c); +} + + +void luaX_init (lua_State *L) { + int i; + for (i=0; itsv.extra = cast_byte(i+1); /* reserved word */ + } +} + + +const char *luaX_token2str (LexState *ls, int token) { + if (token < FIRST_RESERVED) { /* single-byte symbols? */ + lua_assert(token == cast(unsigned char, token)); + return (lisprint(token)) ? luaO_pushfstring(ls->L, LUA_QL("%c"), token) : + luaO_pushfstring(ls->L, "char(%d)", token); + } + else { + const char *s = luaX_tokens[token - FIRST_RESERVED]; + if (token < TK_EOS) /* fixed format (symbols and reserved words)? */ + return luaO_pushfstring(ls->L, LUA_QS, s); + else /* names, strings, and numerals */ + return s; + } +} + + +static const char *txtToken (LexState *ls, int token) { + switch (token) { + case TK_NAME: + case TK_STRING: + case TK_NUMBER: + save(ls, '\0'); + return luaO_pushfstring(ls->L, LUA_QS, luaZ_buffer(ls->buff)); + default: + return luaX_token2str(ls, token); + } +} + + +static l_noret lexerror (LexState *ls, const char *msg, int token) { + char buff[LUA_IDSIZE]; + luaO_chunkid(buff, getstr(ls->source), LUA_IDSIZE); + msg = luaO_pushfstring(ls->L, "%s:%d: %s", buff, ls->linenumber, msg); + if (token) + luaO_pushfstring(ls->L, "%s near %s", msg, txtToken(ls, token)); + luaD_throw(ls->L, LUA_ERRSYNTAX); +} + + +l_noret luaX_syntaxerror (LexState *ls, const char *msg) { + lexerror(ls, msg, ls->t.token); +} + + +/* +** creates a new string and anchors it in function's table so that +** it will not be collected until the end of the function's compilation +** (by that time it should be anchored in function's prototype) +*/ +TString *luaX_newstring (LexState *ls, const char *str, size_t l) { + lua_State *L = ls->L; + TValue *o; /* entry for `str' */ + TString *ts = luaS_newlstr(L, str, l); /* create new string */ + setsvalue2s(L, L->top++, ts); /* temporarily anchor it in stack */ + o = luaH_set(L, ls->fs->h, L->top - 1); + if (ttisnil(o)) { /* not in use yet? (see 'addK') */ + /* boolean value does not need GC barrier; + table has no metatable, so it does not need to invalidate cache */ + setbvalue(o, 1); /* t[string] = true */ + luaC_checkGC(L); + } + L->top--; /* remove string from stack */ + return ts; +} + + +/* +** increment line number and skips newline sequence (any of +** \n, \r, \n\r, or \r\n) +*/ +static void inclinenumber (LexState *ls) { + int old = ls->current; + lua_assert(currIsNewline(ls)); + next(ls); /* skip `\n' or `\r' */ + if (currIsNewline(ls) && ls->current != old) + next(ls); /* skip `\n\r' or `\r\n' */ + if (++ls->linenumber >= MAX_INT) + luaX_syntaxerror(ls, "chunk has too many lines"); +} + + +void luaX_setinput (lua_State *L, LexState *ls, ZIO *z, TString *source, + int firstchar) { + ls->decpoint = '.'; + ls->L = L; + ls->current = firstchar; + ls->lookahead.token = TK_EOS; /* no look-ahead token */ + ls->z = z; + ls->fs = NULL; + ls->linenumber = 1; + ls->lastline = 1; + ls->source = source; + ls->envn = luaS_new(L, LUA_ENV); /* create env name */ + luaS_fix(ls->envn); /* never collect this name */ + luaZ_resizebuffer(ls->L, ls->buff, LUA_MINBUFFER); /* initialize buffer */ +} + + + +/* +** ======================================================= +** LEXICAL ANALYZER +** ======================================================= +*/ + + + +static int check_next (LexState *ls, const char *set) { + if (ls->current == '\0' || !strchr(set, ls->current)) + return 0; + save_and_next(ls); + return 1; +} + + +/* +** change all characters 'from' in buffer to 'to' +*/ +static void buffreplace (LexState *ls, char from, char to) { + size_t n = luaZ_bufflen(ls->buff); + char *p = luaZ_buffer(ls->buff); + while (n--) + if (p[n] == from) p[n] = to; +} + + +#if !defined(getlocaledecpoint) +#define getlocaledecpoint() (localeconv()->decimal_point[0]) +#endif + + +#define buff2d(b,e) luaO_str2d(luaZ_buffer(b), luaZ_bufflen(b) - 1, e) + +/* +** in case of format error, try to change decimal point separator to +** the one defined in the current locale and check again +*/ +static void trydecpoint (LexState *ls, SemInfo *seminfo) { + char old = ls->decpoint; + ls->decpoint = getlocaledecpoint(); + buffreplace(ls, old, ls->decpoint); /* try new decimal separator */ + if (!buff2d(ls->buff, &seminfo->r)) { + /* format error with correct decimal point: no more options */ + buffreplace(ls, ls->decpoint, '.'); /* undo change (for error message) */ + lexerror(ls, "malformed number", TK_NUMBER); + } +} + + +/* LUA_NUMBER */ +/* +** this function is quite liberal in what it accepts, as 'luaO_str2d' +** will reject ill-formed numerals. +*/ +static void read_numeral (LexState *ls, SemInfo *seminfo) { + const char *expo = "Ee"; + int first = ls->current; + lua_assert(lisdigit(ls->current)); + save_and_next(ls); + if (first == '0' && check_next(ls, "Xx")) /* hexadecimal? */ + expo = "Pp"; + for (;;) { + if (check_next(ls, expo)) /* exponent part? */ + check_next(ls, "+-"); /* optional exponent sign */ + if (lisxdigit(ls->current) || ls->current == '.') + save_and_next(ls); + else break; + } + save(ls, '\0'); + buffreplace(ls, '.', ls->decpoint); /* follow locale for decimal point */ + if (!buff2d(ls->buff, &seminfo->r)) /* format error? */ + trydecpoint(ls, seminfo); /* try to update decimal point separator */ +} + + +/* +** skip a sequence '[=*[' or ']=*]' and return its number of '='s or +** -1 if sequence is malformed +*/ +static int skip_sep (LexState *ls) { + int count = 0; + int s = ls->current; + lua_assert(s == '[' || s == ']'); + save_and_next(ls); + while (ls->current == '=') { + save_and_next(ls); + count++; + } + return (ls->current == s) ? count : (-count) - 1; +} + + +static void read_long_string (LexState *ls, SemInfo *seminfo, int sep) { + save_and_next(ls); /* skip 2nd `[' */ + if (currIsNewline(ls)) /* string starts with a newline? */ + inclinenumber(ls); /* skip it */ + for (;;) { + switch (ls->current) { + case EOZ: + lexerror(ls, (seminfo) ? "unfinished long string" : + "unfinished long comment", TK_EOS); + break; /* to avoid warnings */ + case ']': { + if (skip_sep(ls) == sep) { + save_and_next(ls); /* skip 2nd `]' */ + goto endloop; + } + break; + } + case '\n': case '\r': { + save(ls, '\n'); + inclinenumber(ls); + if (!seminfo) luaZ_resetbuffer(ls->buff); /* avoid wasting space */ + break; + } + default: { + if (seminfo) save_and_next(ls); + else next(ls); + } + } + } endloop: + if (seminfo) + seminfo->ts = luaX_newstring(ls, luaZ_buffer(ls->buff) + (2 + sep), + luaZ_bufflen(ls->buff) - 2*(2 + sep)); +} + + +static void escerror (LexState *ls, int *c, int n, const char *msg) { + int i; + luaZ_resetbuffer(ls->buff); /* prepare error message */ + save(ls, '\\'); + for (i = 0; i < n && c[i] != EOZ; i++) + save(ls, c[i]); + lexerror(ls, msg, TK_STRING); +} + + +static int readhexaesc (LexState *ls) { + int c[3], i; /* keep input for error message */ + int r = 0; /* result accumulator */ + c[0] = 'x'; /* for error message */ + for (i = 1; i < 3; i++) { /* read two hexadecimal digits */ + c[i] = next(ls); + if (!lisxdigit(c[i])) + escerror(ls, c, i + 1, "hexadecimal digit expected"); + r = (r << 4) + luaO_hexavalue(c[i]); + } + return r; +} + + +static int readdecesc (LexState *ls) { + int c[3], i; + int r = 0; /* result accumulator */ + for (i = 0; i < 3 && lisdigit(ls->current); i++) { /* read up to 3 digits */ + c[i] = ls->current; + r = 10*r + c[i] - '0'; + next(ls); + } + if (r > UCHAR_MAX) + escerror(ls, c, i, "decimal escape too large"); + return r; +} + + +static void read_string (LexState *ls, int del, SemInfo *seminfo) { + save_and_next(ls); /* keep delimiter (for error messages) */ + while (ls->current != del) { + switch (ls->current) { + case EOZ: + lexerror(ls, "unfinished string", TK_EOS); + break; /* to avoid warnings */ + case '\n': + case '\r': + lexerror(ls, "unfinished string", TK_STRING); + break; /* to avoid warnings */ + case '\\': { /* escape sequences */ + int c; /* final character to be saved */ + next(ls); /* do not save the `\' */ + switch (ls->current) { + case 'a': c = '\a'; goto read_save; + case 'b': c = '\b'; goto read_save; + case 'f': c = '\f'; goto read_save; + case 'n': c = '\n'; goto read_save; + case 'r': c = '\r'; goto read_save; + case 't': c = '\t'; goto read_save; + case 'v': c = '\v'; goto read_save; + case 'x': c = readhexaesc(ls); goto read_save; + case '\n': case '\r': + inclinenumber(ls); c = '\n'; goto only_save; + case '\\': case '\"': case '\'': + c = ls->current; goto read_save; + case EOZ: goto no_save; /* will raise an error next loop */ + case 'z': { /* zap following span of spaces */ + next(ls); /* skip the 'z' */ + while (lisspace(ls->current)) { + if (currIsNewline(ls)) inclinenumber(ls); + else next(ls); + } + goto no_save; + } + default: { + if (!lisdigit(ls->current)) + escerror(ls, &ls->current, 1, "invalid escape sequence"); + /* digital escape \ddd */ + c = readdecesc(ls); + goto only_save; + } + } + read_save: next(ls); /* read next character */ + only_save: save(ls, c); /* save 'c' */ + no_save: break; + } + default: + save_and_next(ls); + } + } + save_and_next(ls); /* skip delimiter */ + seminfo->ts = luaX_newstring(ls, luaZ_buffer(ls->buff) + 1, + luaZ_bufflen(ls->buff) - 2); +} + + +static int llex (LexState *ls, SemInfo *seminfo) { + luaZ_resetbuffer(ls->buff); + for (;;) { + switch (ls->current) { + case '\n': case '\r': { /* line breaks */ + inclinenumber(ls); + break; + } + case ' ': case '\f': case '\t': case '\v': { /* spaces */ + next(ls); + break; + } + case '-': { /* '-' or '--' (comment) */ + next(ls); + if (ls->current != '-') return '-'; + /* else is a comment */ + next(ls); + if (ls->current == '[') { /* long comment? */ + int sep = skip_sep(ls); + luaZ_resetbuffer(ls->buff); /* `skip_sep' may dirty the buffer */ + if (sep >= 0) { + read_long_string(ls, NULL, sep); /* skip long comment */ + luaZ_resetbuffer(ls->buff); /* previous call may dirty the buff. */ + break; + } + } + /* else short comment */ + while (!currIsNewline(ls) && ls->current != EOZ) + next(ls); /* skip until end of line (or end of file) */ + break; + } + case '[': { /* long string or simply '[' */ + int sep = skip_sep(ls); + if (sep >= 0) { + read_long_string(ls, seminfo, sep); + return TK_STRING; + } + else if (sep == -1) return '['; + else lexerror(ls, "invalid long string delimiter", TK_STRING); + } + case '=': { + next(ls); + if (ls->current != '=') return '='; + else { next(ls); return TK_EQ; } + } + case '<': { + next(ls); + if (ls->current != '=') return '<'; + else { next(ls); return TK_LE; } + } + case '>': { + next(ls); + if (ls->current != '=') return '>'; + else { next(ls); return TK_GE; } + } + case '~': { + next(ls); + if (ls->current != '=') return '~'; + else { next(ls); return TK_NE; } + } + case ':': { + next(ls); + if (ls->current != ':') return ':'; + else { next(ls); return TK_DBCOLON; } + } + case '"': case '\'': { /* short literal strings */ + read_string(ls, ls->current, seminfo); + return TK_STRING; + } + case '.': { /* '.', '..', '...', or number */ + save_and_next(ls); + if (check_next(ls, ".")) { + if (check_next(ls, ".")) + return TK_DOTS; /* '...' */ + else return TK_CONCAT; /* '..' */ + } + else if (!lisdigit(ls->current)) return '.'; + /* else go through */ + } + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': { + read_numeral(ls, seminfo); + return TK_NUMBER; + } + case EOZ: { + return TK_EOS; + } + default: { + if (lislalpha(ls->current)) { /* identifier or reserved word? */ + TString *ts; + do { + save_and_next(ls); + } while (lislalnum(ls->current)); + ts = luaX_newstring(ls, luaZ_buffer(ls->buff), + luaZ_bufflen(ls->buff)); + seminfo->ts = ts; + if (isreserved(ts)) /* reserved word? */ + return ts->tsv.extra - 1 + FIRST_RESERVED; + else { + return TK_NAME; + } + } + else { /* single-char tokens (+ - / ...) */ + int c = ls->current; + next(ls); + return c; + } + } + } + } +} + + +void luaX_next (LexState *ls) { + ls->lastline = ls->linenumber; + if (ls->lookahead.token != TK_EOS) { /* is there a look-ahead token? */ + ls->t = ls->lookahead; /* use this one */ + ls->lookahead.token = TK_EOS; /* and discharge it */ + } + else + ls->t.token = llex(ls, &ls->t.seminfo); /* read next token */ +} + + +int luaX_lookahead (LexState *ls) { + lua_assert(ls->lookahead.token == TK_EOS); + ls->lookahead.token = llex(ls, &ls->lookahead.seminfo); + return ls->lookahead.token; +} + diff --git a/ext/lua/src/lmathlib.c b/ext/lua/src/lmathlib.c new file mode 100644 index 000000000..a49f1fd25 --- /dev/null +++ b/ext/lua/src/lmathlib.c @@ -0,0 +1,279 @@ +/* +** $Id: lmathlib.c,v 1.83 2013/03/07 18:21:32 roberto Exp $ +** Standard mathematical library +** See Copyright Notice in lua.h +*/ + + +#include +#include + +#define lmathlib_c +#define LUA_LIB + +#include "lua.h" + +#include "lauxlib.h" +#include "lualib.h" + + +#undef PI +#define PI ((lua_Number)(3.1415926535897932384626433832795)) +#define RADIANS_PER_DEGREE ((lua_Number)(PI/180.0)) + + + +static int math_abs (lua_State *L) { + lua_pushnumber(L, l_mathop(fabs)(luaL_checknumber(L, 1))); + return 1; +} + +static int math_sin (lua_State *L) { + lua_pushnumber(L, l_mathop(sin)(luaL_checknumber(L, 1))); + return 1; +} + +static int math_sinh (lua_State *L) { + lua_pushnumber(L, l_mathop(sinh)(luaL_checknumber(L, 1))); + return 1; +} + +static int math_cos (lua_State *L) { + lua_pushnumber(L, l_mathop(cos)(luaL_checknumber(L, 1))); + return 1; +} + +static int math_cosh (lua_State *L) { + lua_pushnumber(L, l_mathop(cosh)(luaL_checknumber(L, 1))); + return 1; +} + +static int math_tan (lua_State *L) { + lua_pushnumber(L, l_mathop(tan)(luaL_checknumber(L, 1))); + return 1; +} + +static int math_tanh (lua_State *L) { + lua_pushnumber(L, l_mathop(tanh)(luaL_checknumber(L, 1))); + return 1; +} + +static int math_asin (lua_State *L) { + lua_pushnumber(L, l_mathop(asin)(luaL_checknumber(L, 1))); + return 1; +} + +static int math_acos (lua_State *L) { + lua_pushnumber(L, l_mathop(acos)(luaL_checknumber(L, 1))); + return 1; +} + +static int math_atan (lua_State *L) { + lua_pushnumber(L, l_mathop(atan)(luaL_checknumber(L, 1))); + return 1; +} + +static int math_atan2 (lua_State *L) { + lua_pushnumber(L, l_mathop(atan2)(luaL_checknumber(L, 1), + luaL_checknumber(L, 2))); + return 1; +} + +static int math_ceil (lua_State *L) { + lua_pushnumber(L, l_mathop(ceil)(luaL_checknumber(L, 1))); + return 1; +} + +static int math_floor (lua_State *L) { + lua_pushnumber(L, l_mathop(floor)(luaL_checknumber(L, 1))); + return 1; +} + +static int math_fmod (lua_State *L) { + lua_pushnumber(L, l_mathop(fmod)(luaL_checknumber(L, 1), + luaL_checknumber(L, 2))); + return 1; +} + +static int math_modf (lua_State *L) { + lua_Number ip; + lua_Number fp = l_mathop(modf)(luaL_checknumber(L, 1), &ip); + lua_pushnumber(L, ip); + lua_pushnumber(L, fp); + return 2; +} + +static int math_sqrt (lua_State *L) { + lua_pushnumber(L, l_mathop(sqrt)(luaL_checknumber(L, 1))); + return 1; +} + +static int math_pow (lua_State *L) { + lua_Number x = luaL_checknumber(L, 1); + lua_Number y = luaL_checknumber(L, 2); + lua_pushnumber(L, l_mathop(pow)(x, y)); + return 1; +} + +static int math_log (lua_State *L) { + lua_Number x = luaL_checknumber(L, 1); + lua_Number res; + if (lua_isnoneornil(L, 2)) + res = l_mathop(log)(x); + else { + lua_Number base = luaL_checknumber(L, 2); + if (base == (lua_Number)10.0) res = l_mathop(log10)(x); + else res = l_mathop(log)(x)/l_mathop(log)(base); + } + lua_pushnumber(L, res); + return 1; +} + +#if defined(LUA_COMPAT_LOG10) +static int math_log10 (lua_State *L) { + lua_pushnumber(L, l_mathop(log10)(luaL_checknumber(L, 1))); + return 1; +} +#endif + +static int math_exp (lua_State *L) { + lua_pushnumber(L, l_mathop(exp)(luaL_checknumber(L, 1))); + return 1; +} + +static int math_deg (lua_State *L) { + lua_pushnumber(L, luaL_checknumber(L, 1)/RADIANS_PER_DEGREE); + return 1; +} + +static int math_rad (lua_State *L) { + lua_pushnumber(L, luaL_checknumber(L, 1)*RADIANS_PER_DEGREE); + return 1; +} + +static int math_frexp (lua_State *L) { + int e; + lua_pushnumber(L, l_mathop(frexp)(luaL_checknumber(L, 1), &e)); + lua_pushinteger(L, e); + return 2; +} + +static int math_ldexp (lua_State *L) { + lua_Number x = luaL_checknumber(L, 1); + int ep = luaL_checkint(L, 2); + lua_pushnumber(L, l_mathop(ldexp)(x, ep)); + return 1; +} + + + +static int math_min (lua_State *L) { + int n = lua_gettop(L); /* number of arguments */ + lua_Number dmin = luaL_checknumber(L, 1); + int i; + for (i=2; i<=n; i++) { + lua_Number d = luaL_checknumber(L, i); + if (d < dmin) + dmin = d; + } + lua_pushnumber(L, dmin); + return 1; +} + + +static int math_max (lua_State *L) { + int n = lua_gettop(L); /* number of arguments */ + lua_Number dmax = luaL_checknumber(L, 1); + int i; + for (i=2; i<=n; i++) { + lua_Number d = luaL_checknumber(L, i); + if (d > dmax) + dmax = d; + } + lua_pushnumber(L, dmax); + return 1; +} + + +static int math_random (lua_State *L) { + /* the `%' avoids the (rare) case of r==1, and is needed also because on + some systems (SunOS!) `rand()' may return a value larger than RAND_MAX */ + lua_Number r = (lua_Number)(rand()%RAND_MAX) / (lua_Number)RAND_MAX; + switch (lua_gettop(L)) { /* check number of arguments */ + case 0: { /* no arguments */ + lua_pushnumber(L, r); /* Number between 0 and 1 */ + break; + } + case 1: { /* only upper limit */ + lua_Number u = luaL_checknumber(L, 1); + luaL_argcheck(L, (lua_Number)1.0 <= u, 1, "interval is empty"); + lua_pushnumber(L, l_mathop(floor)(r*u) + (lua_Number)(1.0)); /* [1, u] */ + break; + } + case 2: { /* lower and upper limits */ + lua_Number l = luaL_checknumber(L, 1); + lua_Number u = luaL_checknumber(L, 2); + luaL_argcheck(L, l <= u, 2, "interval is empty"); + lua_pushnumber(L, l_mathop(floor)(r*(u-l+1)) + l); /* [l, u] */ + break; + } + default: return luaL_error(L, "wrong number of arguments"); + } + return 1; +} + + +static int math_randomseed (lua_State *L) { + srand(luaL_checkunsigned(L, 1)); + (void)rand(); /* discard first value to avoid undesirable correlations */ + return 0; +} + + +static const luaL_Reg mathlib[] = { + {"abs", math_abs}, + {"acos", math_acos}, + {"asin", math_asin}, + {"atan2", math_atan2}, + {"atan", math_atan}, + {"ceil", math_ceil}, + {"cosh", math_cosh}, + {"cos", math_cos}, + {"deg", math_deg}, + {"exp", math_exp}, + {"floor", math_floor}, + {"fmod", math_fmod}, + {"frexp", math_frexp}, + {"ldexp", math_ldexp}, +#if defined(LUA_COMPAT_LOG10) + {"log10", math_log10}, +#endif + {"log", math_log}, + {"max", math_max}, + {"min", math_min}, + {"modf", math_modf}, + {"pow", math_pow}, + {"rad", math_rad}, + {"random", math_random}, + {"randomseed", math_randomseed}, + {"sinh", math_sinh}, + {"sin", math_sin}, + {"sqrt", math_sqrt}, + {"tanh", math_tanh}, + {"tan", math_tan}, + {NULL, NULL} +}; + + +/* +** Open math library +*/ +LUAMOD_API int luaopen_math (lua_State *L) { + luaL_newlib(L, mathlib); + lua_pushnumber(L, PI); + lua_setfield(L, -2, "pi"); + lua_pushnumber(L, HUGE_VAL); + lua_setfield(L, -2, "huge"); + return 1; +} + diff --git a/ext/lua/src/lmem.c b/ext/lua/src/lmem.c new file mode 100644 index 000000000..3f88496e0 --- /dev/null +++ b/ext/lua/src/lmem.c @@ -0,0 +1,99 @@ +/* +** $Id: lmem.c,v 1.84 2012/05/23 15:41:53 roberto Exp $ +** Interface to Memory Manager +** See Copyright Notice in lua.h +*/ + + +#include + +#define lmem_c +#define LUA_CORE + +#include "lua.h" + +#include "ldebug.h" +#include "ldo.h" +#include "lgc.h" +#include "lmem.h" +#include "lobject.h" +#include "lstate.h" + + + +/* +** About the realloc function: +** void * frealloc (void *ud, void *ptr, size_t osize, size_t nsize); +** (`osize' is the old size, `nsize' is the new size) +** +** * frealloc(ud, NULL, x, s) creates a new block of size `s' (no +** matter 'x'). +** +** * frealloc(ud, p, x, 0) frees the block `p' +** (in this specific case, frealloc must return NULL); +** particularly, frealloc(ud, NULL, 0, 0) does nothing +** (which is equivalent to free(NULL) in ANSI C) +** +** frealloc returns NULL if it cannot create or reallocate the area +** (any reallocation to an equal or smaller size cannot fail!) +*/ + + + +#define MINSIZEARRAY 4 + + +void *luaM_growaux_ (lua_State *L, void *block, int *size, size_t size_elems, + int limit, const char *what) { + void *newblock; + int newsize; + if (*size >= limit/2) { /* cannot double it? */ + if (*size >= limit) /* cannot grow even a little? */ + luaG_runerror(L, "too many %s (limit is %d)", what, limit); + newsize = limit; /* still have at least one free place */ + } + else { + newsize = (*size)*2; + if (newsize < MINSIZEARRAY) + newsize = MINSIZEARRAY; /* minimum size */ + } + newblock = luaM_reallocv(L, block, *size, newsize, size_elems); + *size = newsize; /* update only when everything else is OK */ + return newblock; +} + + +l_noret luaM_toobig (lua_State *L) { + luaG_runerror(L, "memory allocation error: block too big"); +} + + + +/* +** generic allocation routine. +*/ +void *luaM_realloc_ (lua_State *L, void *block, size_t osize, size_t nsize) { + void *newblock; + global_State *g = G(L); + size_t realosize = (block) ? osize : 0; + lua_assert((realosize == 0) == (block == NULL)); +#if defined(HARDMEMTESTS) + if (nsize > realosize && g->gcrunning) + luaC_fullgc(L, 1); /* force a GC whenever possible */ +#endif + newblock = (*g->frealloc)(g->ud, block, osize, nsize); + if (newblock == NULL && nsize > 0) { + api_check(L, nsize > realosize, + "realloc cannot fail when shrinking a block"); + if (g->gcrunning) { + luaC_fullgc(L, 1); /* try to free some memory... */ + newblock = (*g->frealloc)(g->ud, block, osize, nsize); /* try again */ + } + if (newblock == NULL) + luaD_throw(L, LUA_ERRMEM); + } + lua_assert((nsize == 0) == (newblock == NULL)); + g->GCdebt = (g->GCdebt + nsize) - realosize; + return newblock; +} + diff --git a/ext/lua/src/loadlib.c b/ext/lua/src/loadlib.c new file mode 100644 index 000000000..a9959277b --- /dev/null +++ b/ext/lua/src/loadlib.c @@ -0,0 +1,725 @@ +/* +** $Id: loadlib.c,v 1.111 2012/05/30 12:33:44 roberto Exp $ +** Dynamic library loader for Lua +** See Copyright Notice in lua.h +** +** This module contains an implementation of loadlib for Unix systems +** that have dlfcn, an implementation for Windows, and a stub for other +** systems. +*/ + + +/* +** if needed, includes windows header before everything else +*/ +#if defined(_WIN32) +#include +#endif + + +#include +#include + + +#define loadlib_c +#define LUA_LIB + +#include "lua.h" + +#include "lauxlib.h" +#include "lualib.h" + + +/* +** LUA_PATH and LUA_CPATH are the names of the environment +** variables that Lua check to set its paths. +*/ +#if !defined(LUA_PATH) +#define LUA_PATH "LUA_PATH" +#endif + +#if !defined(LUA_CPATH) +#define LUA_CPATH "LUA_CPATH" +#endif + +#define LUA_PATHSUFFIX "_" LUA_VERSION_MAJOR "_" LUA_VERSION_MINOR + +#define LUA_PATHVERSION LUA_PATH LUA_PATHSUFFIX +#define LUA_CPATHVERSION LUA_CPATH LUA_PATHSUFFIX + +/* +** LUA_PATH_SEP is the character that separates templates in a path. +** LUA_PATH_MARK is the string that marks the substitution points in a +** template. +** LUA_EXEC_DIR in a Windows path is replaced by the executable's +** directory. +** LUA_IGMARK is a mark to ignore all before it when building the +** luaopen_ function name. +*/ +#if !defined (LUA_PATH_SEP) +#define LUA_PATH_SEP ";" +#endif +#if !defined (LUA_PATH_MARK) +#define LUA_PATH_MARK "?" +#endif +#if !defined (LUA_EXEC_DIR) +#define LUA_EXEC_DIR "!" +#endif +#if !defined (LUA_IGMARK) +#define LUA_IGMARK "-" +#endif + + +/* +** LUA_CSUBSEP is the character that replaces dots in submodule names +** when searching for a C loader. +** LUA_LSUBSEP is the character that replaces dots in submodule names +** when searching for a Lua loader. +*/ +#if !defined(LUA_CSUBSEP) +#define LUA_CSUBSEP LUA_DIRSEP +#endif + +#if !defined(LUA_LSUBSEP) +#define LUA_LSUBSEP LUA_DIRSEP +#endif + + +/* prefix for open functions in C libraries */ +#define LUA_POF "luaopen_" + +/* separator for open functions in C libraries */ +#define LUA_OFSEP "_" + + +/* table (in the registry) that keeps handles for all loaded C libraries */ +#define CLIBS "_CLIBS" + +#define LIB_FAIL "open" + + +/* error codes for ll_loadfunc */ +#define ERRLIB 1 +#define ERRFUNC 2 + +#define setprogdir(L) ((void)0) + + +/* +** system-dependent functions +*/ +static void ll_unloadlib (void *lib); +static void *ll_load (lua_State *L, const char *path, int seeglb); +static lua_CFunction ll_sym (lua_State *L, void *lib, const char *sym); + + + +#if defined(LUA_USE_DLOPEN) +/* +** {======================================================================== +** This is an implementation of loadlib based on the dlfcn interface. +** The dlfcn interface is available in Linux, SunOS, Solaris, IRIX, FreeBSD, +** NetBSD, AIX 4.2, HPUX 11, and probably most other Unix flavors, at least +** as an emulation layer on top of native functions. +** ========================================================================= +*/ + +#include + +static void ll_unloadlib (void *lib) { + dlclose(lib); +} + + +static void *ll_load (lua_State *L, const char *path, int seeglb) { + void *lib = dlopen(path, RTLD_NOW | (seeglb ? RTLD_GLOBAL : RTLD_LOCAL)); + if (lib == NULL) lua_pushstring(L, dlerror()); + return lib; +} + + +static lua_CFunction ll_sym (lua_State *L, void *lib, const char *sym) { + lua_CFunction f = (lua_CFunction)dlsym(lib, sym); + if (f == NULL) lua_pushstring(L, dlerror()); + return f; +} + +/* }====================================================== */ + + + +#elif defined(LUA_DL_DLL) +/* +** {====================================================================== +** This is an implementation of loadlib for Windows using native functions. +** ======================================================================= +*/ + +#undef setprogdir + +/* +** optional flags for LoadLibraryEx +*/ +#if !defined(LUA_LLE_FLAGS) +#define LUA_LLE_FLAGS 0 +#endif + + +static void setprogdir (lua_State *L) { + char buff[MAX_PATH + 1]; + char *lb; + DWORD nsize = sizeof(buff)/sizeof(char); + DWORD n = GetModuleFileNameA(NULL, buff, nsize); + if (n == 0 || n == nsize || (lb = strrchr(buff, '\\')) == NULL) + luaL_error(L, "unable to get ModuleFileName"); + else { + *lb = '\0'; + luaL_gsub(L, lua_tostring(L, -1), LUA_EXEC_DIR, buff); + lua_remove(L, -2); /* remove original string */ + } +} + + +static void pusherror (lua_State *L) { + int error = GetLastError(); + char buffer[128]; + if (FormatMessageA(FORMAT_MESSAGE_IGNORE_INSERTS | FORMAT_MESSAGE_FROM_SYSTEM, + NULL, error, 0, buffer, sizeof(buffer)/sizeof(char), NULL)) + lua_pushstring(L, buffer); + else + lua_pushfstring(L, "system error %d\n", error); +} + +static void ll_unloadlib (void *lib) { + FreeLibrary((HMODULE)lib); +} + + +static void *ll_load (lua_State *L, const char *path, int seeglb) { + HMODULE lib = LoadLibraryExA(path, NULL, LUA_LLE_FLAGS); + (void)(seeglb); /* not used: symbols are 'global' by default */ + if (lib == NULL) pusherror(L); + return lib; +} + + +static lua_CFunction ll_sym (lua_State *L, void *lib, const char *sym) { + lua_CFunction f = (lua_CFunction)GetProcAddress((HMODULE)lib, sym); + if (f == NULL) pusherror(L); + return f; +} + +/* }====================================================== */ + + +#else +/* +** {====================================================== +** Fallback for other systems +** ======================================================= +*/ + +#undef LIB_FAIL +#define LIB_FAIL "absent" + + +#define DLMSG "dynamic libraries not enabled; check your Lua installation" + + +static void ll_unloadlib (void *lib) { + (void)(lib); /* not used */ +} + + +static void *ll_load (lua_State *L, const char *path, int seeglb) { + (void)(path); (void)(seeglb); /* not used */ + lua_pushliteral(L, DLMSG); + return NULL; +} + + +static lua_CFunction ll_sym (lua_State *L, void *lib, const char *sym) { + (void)(lib); (void)(sym); /* not used */ + lua_pushliteral(L, DLMSG); + return NULL; +} + +/* }====================================================== */ +#endif + + +static void *ll_checkclib (lua_State *L, const char *path) { + void *plib; + lua_getfield(L, LUA_REGISTRYINDEX, CLIBS); + lua_getfield(L, -1, path); + plib = lua_touserdata(L, -1); /* plib = CLIBS[path] */ + lua_pop(L, 2); /* pop CLIBS table and 'plib' */ + return plib; +} + + +static void ll_addtoclib (lua_State *L, const char *path, void *plib) { + lua_getfield(L, LUA_REGISTRYINDEX, CLIBS); + lua_pushlightuserdata(L, plib); + lua_pushvalue(L, -1); + lua_setfield(L, -3, path); /* CLIBS[path] = plib */ + lua_rawseti(L, -2, luaL_len(L, -2) + 1); /* CLIBS[#CLIBS + 1] = plib */ + lua_pop(L, 1); /* pop CLIBS table */ +} + + +/* +** __gc tag method for CLIBS table: calls 'll_unloadlib' for all lib +** handles in list CLIBS +*/ +static int gctm (lua_State *L) { + int n = luaL_len(L, 1); + for (; n >= 1; n--) { /* for each handle, in reverse order */ + lua_rawgeti(L, 1, n); /* get handle CLIBS[n] */ + ll_unloadlib(lua_touserdata(L, -1)); + lua_pop(L, 1); /* pop handle */ + } + return 0; +} + + +static int ll_loadfunc (lua_State *L, const char *path, const char *sym) { + void *reg = ll_checkclib(L, path); /* check loaded C libraries */ + if (reg == NULL) { /* must load library? */ + reg = ll_load(L, path, *sym == '*'); + if (reg == NULL) return ERRLIB; /* unable to load library */ + ll_addtoclib(L, path, reg); + } + if (*sym == '*') { /* loading only library (no function)? */ + lua_pushboolean(L, 1); /* return 'true' */ + return 0; /* no errors */ + } + else { + lua_CFunction f = ll_sym(L, reg, sym); + if (f == NULL) + return ERRFUNC; /* unable to find function */ + lua_pushcfunction(L, f); /* else create new function */ + return 0; /* no errors */ + } +} + + +static int ll_loadlib (lua_State *L) { + const char *path = luaL_checkstring(L, 1); + const char *init = luaL_checkstring(L, 2); + int stat = ll_loadfunc(L, path, init); + if (stat == 0) /* no errors? */ + return 1; /* return the loaded function */ + else { /* error; error message is on stack top */ + lua_pushnil(L); + lua_insert(L, -2); + lua_pushstring(L, (stat == ERRLIB) ? LIB_FAIL : "init"); + return 3; /* return nil, error message, and where */ + } +} + + + +/* +** {====================================================== +** 'require' function +** ======================================================= +*/ + + +static int readable (const char *filename) { + FILE *f = fopen(filename, "r"); /* try to open file */ + if (f == NULL) return 0; /* open failed */ + fclose(f); + return 1; +} + + +static const char *pushnexttemplate (lua_State *L, const char *path) { + const char *l; + while (*path == *LUA_PATH_SEP) path++; /* skip separators */ + if (*path == '\0') return NULL; /* no more templates */ + l = strchr(path, *LUA_PATH_SEP); /* find next separator */ + if (l == NULL) l = path + strlen(path); + lua_pushlstring(L, path, l - path); /* template */ + return l; +} + + +static const char *searchpath (lua_State *L, const char *name, + const char *path, + const char *sep, + const char *dirsep) { + luaL_Buffer msg; /* to build error message */ + luaL_buffinit(L, &msg); + if (*sep != '\0') /* non-empty separator? */ + name = luaL_gsub(L, name, sep, dirsep); /* replace it by 'dirsep' */ + while ((path = pushnexttemplate(L, path)) != NULL) { + const char *filename = luaL_gsub(L, lua_tostring(L, -1), + LUA_PATH_MARK, name); + lua_remove(L, -2); /* remove path template */ + if (readable(filename)) /* does file exist and is readable? */ + return filename; /* return that file name */ + lua_pushfstring(L, "\n\tno file " LUA_QS, filename); + lua_remove(L, -2); /* remove file name */ + luaL_addvalue(&msg); /* concatenate error msg. entry */ + } + luaL_pushresult(&msg); /* create error message */ + return NULL; /* not found */ +} + + +static int ll_searchpath (lua_State *L) { + const char *f = searchpath(L, luaL_checkstring(L, 1), + luaL_checkstring(L, 2), + luaL_optstring(L, 3, "."), + luaL_optstring(L, 4, LUA_DIRSEP)); + if (f != NULL) return 1; + else { /* error message is on top of the stack */ + lua_pushnil(L); + lua_insert(L, -2); + return 2; /* return nil + error message */ + } +} + + +static const char *findfile (lua_State *L, const char *name, + const char *pname, + const char *dirsep) { + const char *path; + lua_getfield(L, lua_upvalueindex(1), pname); + path = lua_tostring(L, -1); + if (path == NULL) + luaL_error(L, LUA_QL("package.%s") " must be a string", pname); + return searchpath(L, name, path, ".", dirsep); +} + + +static int checkload (lua_State *L, int stat, const char *filename) { + if (stat) { /* module loaded successfully? */ + lua_pushstring(L, filename); /* will be 2nd argument to module */ + return 2; /* return open function and file name */ + } + else + return luaL_error(L, "error loading module " LUA_QS + " from file " LUA_QS ":\n\t%s", + lua_tostring(L, 1), filename, lua_tostring(L, -1)); +} + + +static int searcher_Lua (lua_State *L) { + const char *filename; + const char *name = luaL_checkstring(L, 1); + filename = findfile(L, name, "path", LUA_LSUBSEP); + if (filename == NULL) return 1; /* module not found in this path */ + return checkload(L, (luaL_loadfile(L, filename) == LUA_OK), filename); +} + + +static int loadfunc (lua_State *L, const char *filename, const char *modname) { + const char *funcname; + const char *mark; + modname = luaL_gsub(L, modname, ".", LUA_OFSEP); + mark = strchr(modname, *LUA_IGMARK); + if (mark) { + int stat; + funcname = lua_pushlstring(L, modname, mark - modname); + funcname = lua_pushfstring(L, LUA_POF"%s", funcname); + stat = ll_loadfunc(L, filename, funcname); + if (stat != ERRFUNC) return stat; + modname = mark + 1; /* else go ahead and try old-style name */ + } + funcname = lua_pushfstring(L, LUA_POF"%s", modname); + return ll_loadfunc(L, filename, funcname); +} + + +static int searcher_C (lua_State *L) { + const char *name = luaL_checkstring(L, 1); + const char *filename = findfile(L, name, "cpath", LUA_CSUBSEP); + if (filename == NULL) return 1; /* module not found in this path */ + return checkload(L, (loadfunc(L, filename, name) == 0), filename); +} + + +static int searcher_Croot (lua_State *L) { + const char *filename; + const char *name = luaL_checkstring(L, 1); + const char *p = strchr(name, '.'); + int stat; + if (p == NULL) return 0; /* is root */ + lua_pushlstring(L, name, p - name); + filename = findfile(L, lua_tostring(L, -1), "cpath", LUA_CSUBSEP); + if (filename == NULL) return 1; /* root not found */ + if ((stat = loadfunc(L, filename, name)) != 0) { + if (stat != ERRFUNC) + return checkload(L, 0, filename); /* real error */ + else { /* open function not found */ + lua_pushfstring(L, "\n\tno module " LUA_QS " in file " LUA_QS, + name, filename); + return 1; + } + } + lua_pushstring(L, filename); /* will be 2nd argument to module */ + return 2; +} + + +static int searcher_preload (lua_State *L) { + const char *name = luaL_checkstring(L, 1); + lua_getfield(L, LUA_REGISTRYINDEX, "_PRELOAD"); + lua_getfield(L, -1, name); + if (lua_isnil(L, -1)) /* not found? */ + lua_pushfstring(L, "\n\tno field package.preload['%s']", name); + return 1; +} + + +static void findloader (lua_State *L, const char *name) { + int i; + luaL_Buffer msg; /* to build error message */ + luaL_buffinit(L, &msg); + lua_getfield(L, lua_upvalueindex(1), "searchers"); /* will be at index 3 */ + if (!lua_istable(L, 3)) + luaL_error(L, LUA_QL("package.searchers") " must be a table"); + /* iterate over available searchers to find a loader */ + for (i = 1; ; i++) { + lua_rawgeti(L, 3, i); /* get a searcher */ + if (lua_isnil(L, -1)) { /* no more searchers? */ + lua_pop(L, 1); /* remove nil */ + luaL_pushresult(&msg); /* create error message */ + luaL_error(L, "module " LUA_QS " not found:%s", + name, lua_tostring(L, -1)); + } + lua_pushstring(L, name); + lua_call(L, 1, 2); /* call it */ + if (lua_isfunction(L, -2)) /* did it find a loader? */ + return; /* module loader found */ + else if (lua_isstring(L, -2)) { /* searcher returned error message? */ + lua_pop(L, 1); /* remove extra return */ + luaL_addvalue(&msg); /* concatenate error message */ + } + else + lua_pop(L, 2); /* remove both returns */ + } +} + + +static int ll_require (lua_State *L) { + const char *name = luaL_checkstring(L, 1); + lua_settop(L, 1); /* _LOADED table will be at index 2 */ + lua_getfield(L, LUA_REGISTRYINDEX, "_LOADED"); + lua_getfield(L, 2, name); /* _LOADED[name] */ + if (lua_toboolean(L, -1)) /* is it there? */ + return 1; /* package is already loaded */ + /* else must load package */ + lua_pop(L, 1); /* remove 'getfield' result */ + findloader(L, name); + lua_pushstring(L, name); /* pass name as argument to module loader */ + lua_insert(L, -2); /* name is 1st argument (before search data) */ + lua_call(L, 2, 1); /* run loader to load module */ + if (!lua_isnil(L, -1)) /* non-nil return? */ + lua_setfield(L, 2, name); /* _LOADED[name] = returned value */ + lua_getfield(L, 2, name); + if (lua_isnil(L, -1)) { /* module did not set a value? */ + lua_pushboolean(L, 1); /* use true as result */ + lua_pushvalue(L, -1); /* extra copy to be returned */ + lua_setfield(L, 2, name); /* _LOADED[name] = true */ + } + return 1; +} + +/* }====================================================== */ + + + +/* +** {====================================================== +** 'module' function +** ======================================================= +*/ +#if defined(LUA_COMPAT_MODULE) + +/* +** changes the environment variable of calling function +*/ +static void set_env (lua_State *L) { + lua_Debug ar; + if (lua_getstack(L, 1, &ar) == 0 || + lua_getinfo(L, "f", &ar) == 0 || /* get calling function */ + lua_iscfunction(L, -1)) + luaL_error(L, LUA_QL("module") " not called from a Lua function"); + lua_pushvalue(L, -2); /* copy new environment table to top */ + lua_setupvalue(L, -2, 1); + lua_pop(L, 1); /* remove function */ +} + + +static void dooptions (lua_State *L, int n) { + int i; + for (i = 2; i <= n; i++) { + if (lua_isfunction(L, i)) { /* avoid 'calling' extra info. */ + lua_pushvalue(L, i); /* get option (a function) */ + lua_pushvalue(L, -2); /* module */ + lua_call(L, 1, 0); + } + } +} + + +static void modinit (lua_State *L, const char *modname) { + const char *dot; + lua_pushvalue(L, -1); + lua_setfield(L, -2, "_M"); /* module._M = module */ + lua_pushstring(L, modname); + lua_setfield(L, -2, "_NAME"); + dot = strrchr(modname, '.'); /* look for last dot in module name */ + if (dot == NULL) dot = modname; + else dot++; + /* set _PACKAGE as package name (full module name minus last part) */ + lua_pushlstring(L, modname, dot - modname); + lua_setfield(L, -2, "_PACKAGE"); +} + + +static int ll_module (lua_State *L) { + const char *modname = luaL_checkstring(L, 1); + int lastarg = lua_gettop(L); /* last parameter */ + luaL_pushmodule(L, modname, 1); /* get/create module table */ + /* check whether table already has a _NAME field */ + lua_getfield(L, -1, "_NAME"); + if (!lua_isnil(L, -1)) /* is table an initialized module? */ + lua_pop(L, 1); + else { /* no; initialize it */ + lua_pop(L, 1); + modinit(L, modname); + } + lua_pushvalue(L, -1); + set_env(L); + dooptions(L, lastarg); + return 1; +} + + +static int ll_seeall (lua_State *L) { + luaL_checktype(L, 1, LUA_TTABLE); + if (!lua_getmetatable(L, 1)) { + lua_createtable(L, 0, 1); /* create new metatable */ + lua_pushvalue(L, -1); + lua_setmetatable(L, 1); + } + lua_pushglobaltable(L); + lua_setfield(L, -2, "__index"); /* mt.__index = _G */ + return 0; +} + +#endif +/* }====================================================== */ + + + +/* auxiliary mark (for internal use) */ +#define AUXMARK "\1" + + +/* +** return registry.LUA_NOENV as a boolean +*/ +static int noenv (lua_State *L) { + int b; + lua_getfield(L, LUA_REGISTRYINDEX, "LUA_NOENV"); + b = lua_toboolean(L, -1); + lua_pop(L, 1); /* remove value */ + return b; +} + + +static void setpath (lua_State *L, const char *fieldname, const char *envname1, + const char *envname2, const char *def) { + const char *path = getenv(envname1); + if (path == NULL) /* no environment variable? */ + path = getenv(envname2); /* try alternative name */ + if (path == NULL || noenv(L)) /* no environment variable? */ + lua_pushstring(L, def); /* use default */ + else { + /* replace ";;" by ";AUXMARK;" and then AUXMARK by default path */ + path = luaL_gsub(L, path, LUA_PATH_SEP LUA_PATH_SEP, + LUA_PATH_SEP AUXMARK LUA_PATH_SEP); + luaL_gsub(L, path, AUXMARK, def); + lua_remove(L, -2); + } + setprogdir(L); + lua_setfield(L, -2, fieldname); +} + + +static const luaL_Reg pk_funcs[] = { + {"loadlib", ll_loadlib}, + {"searchpath", ll_searchpath}, +#if defined(LUA_COMPAT_MODULE) + {"seeall", ll_seeall}, +#endif + {NULL, NULL} +}; + + +static const luaL_Reg ll_funcs[] = { +#if defined(LUA_COMPAT_MODULE) + {"module", ll_module}, +#endif + {"require", ll_require}, + {NULL, NULL} +}; + + +static void createsearcherstable (lua_State *L) { + static const lua_CFunction searchers[] = + {searcher_preload, searcher_Lua, searcher_C, searcher_Croot, NULL}; + int i; + /* create 'searchers' table */ + lua_createtable(L, sizeof(searchers)/sizeof(searchers[0]) - 1, 0); + /* fill it with pre-defined searchers */ + for (i=0; searchers[i] != NULL; i++) { + lua_pushvalue(L, -2); /* set 'package' as upvalue for all searchers */ + lua_pushcclosure(L, searchers[i], 1); + lua_rawseti(L, -2, i+1); + } +} + + +LUAMOD_API int luaopen_package (lua_State *L) { + /* create table CLIBS to keep track of loaded C libraries */ + luaL_getsubtable(L, LUA_REGISTRYINDEX, CLIBS); + lua_createtable(L, 0, 1); /* metatable for CLIBS */ + lua_pushcfunction(L, gctm); + lua_setfield(L, -2, "__gc"); /* set finalizer for CLIBS table */ + lua_setmetatable(L, -2); + /* create `package' table */ + luaL_newlib(L, pk_funcs); + createsearcherstable(L); +#if defined(LUA_COMPAT_LOADERS) + lua_pushvalue(L, -1); /* make a copy of 'searchers' table */ + lua_setfield(L, -3, "loaders"); /* put it in field `loaders' */ +#endif + lua_setfield(L, -2, "searchers"); /* put it in field 'searchers' */ + /* set field 'path' */ + setpath(L, "path", LUA_PATHVERSION, LUA_PATH, LUA_PATH_DEFAULT); + /* set field 'cpath' */ + setpath(L, "cpath", LUA_CPATHVERSION, LUA_CPATH, LUA_CPATH_DEFAULT); + /* store config information */ + lua_pushliteral(L, LUA_DIRSEP "\n" LUA_PATH_SEP "\n" LUA_PATH_MARK "\n" + LUA_EXEC_DIR "\n" LUA_IGMARK "\n"); + lua_setfield(L, -2, "config"); + /* set field `loaded' */ + luaL_getsubtable(L, LUA_REGISTRYINDEX, "_LOADED"); + lua_setfield(L, -2, "loaded"); + /* set field `preload' */ + luaL_getsubtable(L, LUA_REGISTRYINDEX, "_PRELOAD"); + lua_setfield(L, -2, "preload"); + lua_pushglobaltable(L); + lua_pushvalue(L, -2); /* set 'package' as upvalue for next lib */ + luaL_setfuncs(L, ll_funcs, 1); /* open lib into global table */ + lua_pop(L, 1); /* pop global table */ + return 1; /* return 'package' table */ +} + diff --git a/ext/lua/src/lobject.c b/ext/lua/src/lobject.c new file mode 100644 index 000000000..c152785a5 --- /dev/null +++ b/ext/lua/src/lobject.c @@ -0,0 +1,287 @@ +/* +** $Id: lobject.c,v 2.58 2013/02/20 14:08:56 roberto Exp $ +** Some generic functions over Lua objects +** See Copyright Notice in lua.h +*/ + +#include +#include +#include +#include + +#define lobject_c +#define LUA_CORE + +#include "lua.h" + +#include "lctype.h" +#include "ldebug.h" +#include "ldo.h" +#include "lmem.h" +#include "lobject.h" +#include "lstate.h" +#include "lstring.h" +#include "lvm.h" + + + +LUAI_DDEF const TValue luaO_nilobject_ = {NILCONSTANT}; + + +/* +** converts an integer to a "floating point byte", represented as +** (eeeeexxx), where the real value is (1xxx) * 2^(eeeee - 1) if +** eeeee != 0 and (xxx) otherwise. +*/ +int luaO_int2fb (unsigned int x) { + int e = 0; /* exponent */ + if (x < 8) return x; + while (x >= 0x10) { + x = (x+1) >> 1; + e++; + } + return ((e+1) << 3) | (cast_int(x) - 8); +} + + +/* converts back */ +int luaO_fb2int (int x) { + int e = (x >> 3) & 0x1f; + if (e == 0) return x; + else return ((x & 7) + 8) << (e - 1); +} + + +int luaO_ceillog2 (unsigned int x) { + static const lu_byte log_2[256] = { + 0,1,2,2,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5, + 6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6, + 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, + 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, + 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, + 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, + 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, + 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8 + }; + int l = 0; + x--; + while (x >= 256) { l += 8; x >>= 8; } + return l + log_2[x]; +} + + +lua_Number luaO_arith (int op, lua_Number v1, lua_Number v2) { + switch (op) { + case LUA_OPADD: return luai_numadd(NULL, v1, v2); + case LUA_OPSUB: return luai_numsub(NULL, v1, v2); + case LUA_OPMUL: return luai_nummul(NULL, v1, v2); + case LUA_OPDIV: return luai_numdiv(NULL, v1, v2); + case LUA_OPMOD: return luai_nummod(NULL, v1, v2); + case LUA_OPPOW: return luai_numpow(NULL, v1, v2); + case LUA_OPUNM: return luai_numunm(NULL, v1); + default: lua_assert(0); return 0; + } +} + + +int luaO_hexavalue (int c) { + if (lisdigit(c)) return c - '0'; + else return ltolower(c) - 'a' + 10; +} + + +#if !defined(lua_strx2number) + +#include + + +static int isneg (const char **s) { + if (**s == '-') { (*s)++; return 1; } + else if (**s == '+') (*s)++; + return 0; +} + + +static lua_Number readhexa (const char **s, lua_Number r, int *count) { + for (; lisxdigit(cast_uchar(**s)); (*s)++) { /* read integer part */ + r = (r * cast_num(16.0)) + cast_num(luaO_hexavalue(cast_uchar(**s))); + (*count)++; + } + return r; +} + + +/* +** convert an hexadecimal numeric string to a number, following +** C99 specification for 'strtod' +*/ +static lua_Number lua_strx2number (const char *s, char **endptr) { + lua_Number r = 0.0; + int e = 0, i = 0; + int neg = 0; /* 1 if number is negative */ + *endptr = cast(char *, s); /* nothing is valid yet */ + while (lisspace(cast_uchar(*s))) s++; /* skip initial spaces */ + neg = isneg(&s); /* check signal */ + if (!(*s == '0' && (*(s + 1) == 'x' || *(s + 1) == 'X'))) /* check '0x' */ + return 0.0; /* invalid format (no '0x') */ + s += 2; /* skip '0x' */ + r = readhexa(&s, r, &i); /* read integer part */ + if (*s == '.') { + s++; /* skip dot */ + r = readhexa(&s, r, &e); /* read fractional part */ + } + if (i == 0 && e == 0) + return 0.0; /* invalid format (no digit) */ + e *= -4; /* each fractional digit divides value by 2^-4 */ + *endptr = cast(char *, s); /* valid up to here */ + if (*s == 'p' || *s == 'P') { /* exponent part? */ + int exp1 = 0; + int neg1; + s++; /* skip 'p' */ + neg1 = isneg(&s); /* signal */ + if (!lisdigit(cast_uchar(*s))) + goto ret; /* must have at least one digit */ + while (lisdigit(cast_uchar(*s))) /* read exponent */ + exp1 = exp1 * 10 + *(s++) - '0'; + if (neg1) exp1 = -exp1; + e += exp1; + } + *endptr = cast(char *, s); /* valid up to here */ + ret: + if (neg) r = -r; + return l_mathop(ldexp)(r, e); +} + +#endif + + +int luaO_str2d (const char *s, size_t len, lua_Number *result) { + char *endptr; + if (strpbrk(s, "nN")) /* reject 'inf' and 'nan' */ + return 0; + else if (strpbrk(s, "xX")) /* hexa? */ + *result = lua_strx2number(s, &endptr); + else + *result = lua_str2number(s, &endptr); + if (endptr == s) return 0; /* nothing recognized */ + while (lisspace(cast_uchar(*endptr))) endptr++; + return (endptr == s + len); /* OK if no trailing characters */ +} + + + +static void pushstr (lua_State *L, const char *str, size_t l) { + setsvalue2s(L, L->top++, luaS_newlstr(L, str, l)); +} + + +/* this function handles only `%d', `%c', %f, %p, and `%s' formats */ +const char *luaO_pushvfstring (lua_State *L, const char *fmt, va_list argp) { + int n = 0; + for (;;) { + const char *e = strchr(fmt, '%'); + if (e == NULL) break; + luaD_checkstack(L, 2); /* fmt + item */ + pushstr(L, fmt, e - fmt); + switch (*(e+1)) { + case 's': { + const char *s = va_arg(argp, char *); + if (s == NULL) s = "(null)"; + pushstr(L, s, strlen(s)); + break; + } + case 'c': { + char buff; + buff = cast(char, va_arg(argp, int)); + pushstr(L, &buff, 1); + break; + } + case 'd': { + setnvalue(L->top++, cast_num(va_arg(argp, int))); + break; + } + case 'f': { + setnvalue(L->top++, cast_num(va_arg(argp, l_uacNumber))); + break; + } + case 'p': { + char buff[4*sizeof(void *) + 8]; /* should be enough space for a `%p' */ + int l = sprintf(buff, "%p", va_arg(argp, void *)); + pushstr(L, buff, l); + break; + } + case '%': { + pushstr(L, "%", 1); + break; + } + default: { + luaG_runerror(L, + "invalid option " LUA_QL("%%%c") " to " LUA_QL("lua_pushfstring"), + *(e + 1)); + } + } + n += 2; + fmt = e+2; + } + luaD_checkstack(L, 1); + pushstr(L, fmt, strlen(fmt)); + if (n > 0) luaV_concat(L, n + 1); + return svalue(L->top - 1); +} + + +const char *luaO_pushfstring (lua_State *L, const char *fmt, ...) { + const char *msg; + va_list argp; + va_start(argp, fmt); + msg = luaO_pushvfstring(L, fmt, argp); + va_end(argp); + return msg; +} + + +/* number of chars of a literal string without the ending \0 */ +#define LL(x) (sizeof(x)/sizeof(char) - 1) + +#define RETS "..." +#define PRE "[string \"" +#define POS "\"]" + +#define addstr(a,b,l) ( memcpy(a,b,(l) * sizeof(char)), a += (l) ) + +void luaO_chunkid (char *out, const char *source, size_t bufflen) { + size_t l = strlen(source); + if (*source == '=') { /* 'literal' source */ + if (l <= bufflen) /* small enough? */ + memcpy(out, source + 1, l * sizeof(char)); + else { /* truncate it */ + addstr(out, source + 1, bufflen - 1); + *out = '\0'; + } + } + else if (*source == '@') { /* file name */ + if (l <= bufflen) /* small enough? */ + memcpy(out, source + 1, l * sizeof(char)); + else { /* add '...' before rest of name */ + addstr(out, RETS, LL(RETS)); + bufflen -= LL(RETS); + memcpy(out, source + 1 + l - bufflen, bufflen * sizeof(char)); + } + } + else { /* string; format as [string "source"] */ + const char *nl = strchr(source, '\n'); /* find first new line (if any) */ + addstr(out, PRE, LL(PRE)); /* add prefix */ + bufflen -= LL(PRE RETS POS) + 1; /* save space for prefix+suffix+'\0' */ + if (l < bufflen && nl == NULL) { /* small one-line source? */ + addstr(out, source, l); /* keep it */ + } + else { + if (nl != NULL) l = nl - source; /* stop at first newline */ + if (l > bufflen) l = bufflen; + addstr(out, source, l); + addstr(out, RETS, LL(RETS)); + } + memcpy(out, POS, (LL(POS) + 1) * sizeof(char)); + } +} + diff --git a/ext/lua/src/lopcodes.c b/ext/lua/src/lopcodes.c new file mode 100644 index 000000000..ef7369275 --- /dev/null +++ b/ext/lua/src/lopcodes.c @@ -0,0 +1,107 @@ +/* +** $Id: lopcodes.c,v 1.49 2012/05/14 13:34:18 roberto Exp $ +** Opcodes for Lua virtual machine +** See Copyright Notice in lua.h +*/ + + +#define lopcodes_c +#define LUA_CORE + + +#include "lopcodes.h" + + +/* ORDER OP */ + +LUAI_DDEF const char *const luaP_opnames[NUM_OPCODES+1] = { + "MOVE", + "LOADK", + "LOADKX", + "LOADBOOL", + "LOADNIL", + "GETUPVAL", + "GETTABUP", + "GETTABLE", + "SETTABUP", + "SETUPVAL", + "SETTABLE", + "NEWTABLE", + "SELF", + "ADD", + "SUB", + "MUL", + "DIV", + "MOD", + "POW", + "UNM", + "NOT", + "LEN", + "CONCAT", + "JMP", + "EQ", + "LT", + "LE", + "TEST", + "TESTSET", + "CALL", + "TAILCALL", + "RETURN", + "FORLOOP", + "FORPREP", + "TFORCALL", + "TFORLOOP", + "SETLIST", + "CLOSURE", + "VARARG", + "EXTRAARG", + NULL +}; + + +#define opmode(t,a,b,c,m) (((t)<<7) | ((a)<<6) | ((b)<<4) | ((c)<<2) | (m)) + +LUAI_DDEF const lu_byte luaP_opmodes[NUM_OPCODES] = { +/* T A B C mode opcode */ + opmode(0, 1, OpArgR, OpArgN, iABC) /* OP_MOVE */ + ,opmode(0, 1, OpArgK, OpArgN, iABx) /* OP_LOADK */ + ,opmode(0, 1, OpArgN, OpArgN, iABx) /* OP_LOADKX */ + ,opmode(0, 1, OpArgU, OpArgU, iABC) /* OP_LOADBOOL */ + ,opmode(0, 1, OpArgU, OpArgN, iABC) /* OP_LOADNIL */ + ,opmode(0, 1, OpArgU, OpArgN, iABC) /* OP_GETUPVAL */ + ,opmode(0, 1, OpArgU, OpArgK, iABC) /* OP_GETTABUP */ + ,opmode(0, 1, OpArgR, OpArgK, iABC) /* OP_GETTABLE */ + ,opmode(0, 0, OpArgK, OpArgK, iABC) /* OP_SETTABUP */ + ,opmode(0, 0, OpArgU, OpArgN, iABC) /* OP_SETUPVAL */ + ,opmode(0, 0, OpArgK, OpArgK, iABC) /* OP_SETTABLE */ + ,opmode(0, 1, OpArgU, OpArgU, iABC) /* OP_NEWTABLE */ + ,opmode(0, 1, OpArgR, OpArgK, iABC) /* OP_SELF */ + ,opmode(0, 1, OpArgK, OpArgK, iABC) /* OP_ADD */ + ,opmode(0, 1, OpArgK, OpArgK, iABC) /* OP_SUB */ + ,opmode(0, 1, OpArgK, OpArgK, iABC) /* OP_MUL */ + ,opmode(0, 1, OpArgK, OpArgK, iABC) /* OP_DIV */ + ,opmode(0, 1, OpArgK, OpArgK, iABC) /* OP_MOD */ + ,opmode(0, 1, OpArgK, OpArgK, iABC) /* OP_POW */ + ,opmode(0, 1, OpArgR, OpArgN, iABC) /* OP_UNM */ + ,opmode(0, 1, OpArgR, OpArgN, iABC) /* OP_NOT */ + ,opmode(0, 1, OpArgR, OpArgN, iABC) /* OP_LEN */ + ,opmode(0, 1, OpArgR, OpArgR, iABC) /* OP_CONCAT */ + ,opmode(0, 0, OpArgR, OpArgN, iAsBx) /* OP_JMP */ + ,opmode(1, 0, OpArgK, OpArgK, iABC) /* OP_EQ */ + ,opmode(1, 0, OpArgK, OpArgK, iABC) /* OP_LT */ + ,opmode(1, 0, OpArgK, OpArgK, iABC) /* OP_LE */ + ,opmode(1, 0, OpArgN, OpArgU, iABC) /* OP_TEST */ + ,opmode(1, 1, OpArgR, OpArgU, iABC) /* OP_TESTSET */ + ,opmode(0, 1, OpArgU, OpArgU, iABC) /* OP_CALL */ + ,opmode(0, 1, OpArgU, OpArgU, iABC) /* OP_TAILCALL */ + ,opmode(0, 0, OpArgU, OpArgN, iABC) /* OP_RETURN */ + ,opmode(0, 1, OpArgR, OpArgN, iAsBx) /* OP_FORLOOP */ + ,opmode(0, 1, OpArgR, OpArgN, iAsBx) /* OP_FORPREP */ + ,opmode(0, 0, OpArgN, OpArgU, iABC) /* OP_TFORCALL */ + ,opmode(0, 1, OpArgR, OpArgN, iAsBx) /* OP_TFORLOOP */ + ,opmode(0, 0, OpArgU, OpArgU, iABC) /* OP_SETLIST */ + ,opmode(0, 1, OpArgU, OpArgN, iABx) /* OP_CLOSURE */ + ,opmode(0, 1, OpArgU, OpArgN, iABC) /* OP_VARARG */ + ,opmode(0, 0, OpArgU, OpArgU, iAx) /* OP_EXTRAARG */ +}; + diff --git a/ext/lua/src/loslib.c b/ext/lua/src/loslib.c new file mode 100644 index 000000000..5170fd0d0 --- /dev/null +++ b/ext/lua/src/loslib.c @@ -0,0 +1,323 @@ +/* +** $Id: loslib.c,v 1.40 2012/10/19 15:54:02 roberto Exp $ +** Standard Operating System library +** See Copyright Notice in lua.h +*/ + + +#include +#include +#include +#include +#include + +#define loslib_c +#define LUA_LIB + +#include "lua.h" + +#include "lauxlib.h" +#include "lualib.h" + + +/* +** list of valid conversion specifiers for the 'strftime' function +*/ +#if !defined(LUA_STRFTIMEOPTIONS) + +#if !defined(LUA_USE_POSIX) +#define LUA_STRFTIMEOPTIONS { "aAbBcdHIjmMpSUwWxXyYz%", "" } +#else +#define LUA_STRFTIMEOPTIONS \ + { "aAbBcCdDeFgGhHIjmMnprRStTuUVwWxXyYzZ%", "" \ + "", "E", "cCxXyY", \ + "O", "deHImMSuUVwWy" } +#endif + +#endif + + + +/* +** By default, Lua uses tmpnam except when POSIX is available, where it +** uses mkstemp. +*/ +#if defined(LUA_USE_MKSTEMP) +#include +#define LUA_TMPNAMBUFSIZE 32 +#define lua_tmpnam(b,e) { \ + strcpy(b, "/tmp/lua_XXXXXX"); \ + e = mkstemp(b); \ + if (e != -1) close(e); \ + e = (e == -1); } + +#elif !defined(lua_tmpnam) + +#define LUA_TMPNAMBUFSIZE L_tmpnam +#define lua_tmpnam(b,e) { e = (tmpnam(b) == NULL); } + +#endif + + +/* +** By default, Lua uses gmtime/localtime, except when POSIX is available, +** where it uses gmtime_r/localtime_r +*/ +#if defined(LUA_USE_GMTIME_R) + +#define l_gmtime(t,r) gmtime_r(t,r) +#define l_localtime(t,r) localtime_r(t,r) + +#elif !defined(l_gmtime) + +#define l_gmtime(t,r) ((void)r, gmtime(t)) +#define l_localtime(t,r) ((void)r, localtime(t)) + +#endif + + + +static int os_execute (lua_State *L) { + const char *cmd = luaL_optstring(L, 1, NULL); + int stat = system(cmd); + if (cmd != NULL) + return luaL_execresult(L, stat); + else { + lua_pushboolean(L, stat); /* true if there is a shell */ + return 1; + } +} + + +static int os_remove (lua_State *L) { + const char *filename = luaL_checkstring(L, 1); + return luaL_fileresult(L, remove(filename) == 0, filename); +} + + +static int os_rename (lua_State *L) { + const char *fromname = luaL_checkstring(L, 1); + const char *toname = luaL_checkstring(L, 2); + return luaL_fileresult(L, rename(fromname, toname) == 0, NULL); +} + + +static int os_tmpname (lua_State *L) { + char buff[LUA_TMPNAMBUFSIZE]; + int err; + lua_tmpnam(buff, err); + if (err) + return luaL_error(L, "unable to generate a unique filename"); + lua_pushstring(L, buff); + return 1; +} + + +static int os_getenv (lua_State *L) { + lua_pushstring(L, getenv(luaL_checkstring(L, 1))); /* if NULL push nil */ + return 1; +} + + +static int os_clock (lua_State *L) { + lua_pushnumber(L, ((lua_Number)clock())/(lua_Number)CLOCKS_PER_SEC); + return 1; +} + + +/* +** {====================================================== +** Time/Date operations +** { year=%Y, month=%m, day=%d, hour=%H, min=%M, sec=%S, +** wday=%w+1, yday=%j, isdst=? } +** ======================================================= +*/ + +static void setfield (lua_State *L, const char *key, int value) { + lua_pushinteger(L, value); + lua_setfield(L, -2, key); +} + +static void setboolfield (lua_State *L, const char *key, int value) { + if (value < 0) /* undefined? */ + return; /* does not set field */ + lua_pushboolean(L, value); + lua_setfield(L, -2, key); +} + +static int getboolfield (lua_State *L, const char *key) { + int res; + lua_getfield(L, -1, key); + res = lua_isnil(L, -1) ? -1 : lua_toboolean(L, -1); + lua_pop(L, 1); + return res; +} + + +static int getfield (lua_State *L, const char *key, int d) { + int res, isnum; + lua_getfield(L, -1, key); + res = (int)lua_tointegerx(L, -1, &isnum); + if (!isnum) { + if (d < 0) + return luaL_error(L, "field " LUA_QS " missing in date table", key); + res = d; + } + lua_pop(L, 1); + return res; +} + + +static const char *checkoption (lua_State *L, const char *conv, char *buff) { + static const char *const options[] = LUA_STRFTIMEOPTIONS; + unsigned int i; + for (i = 0; i < sizeof(options)/sizeof(options[0]); i += 2) { + if (*conv != '\0' && strchr(options[i], *conv) != NULL) { + buff[1] = *conv; + if (*options[i + 1] == '\0') { /* one-char conversion specifier? */ + buff[2] = '\0'; /* end buffer */ + return conv + 1; + } + else if (*(conv + 1) != '\0' && + strchr(options[i + 1], *(conv + 1)) != NULL) { + buff[2] = *(conv + 1); /* valid two-char conversion specifier */ + buff[3] = '\0'; /* end buffer */ + return conv + 2; + } + } + } + luaL_argerror(L, 1, + lua_pushfstring(L, "invalid conversion specifier '%%%s'", conv)); + return conv; /* to avoid warnings */ +} + + +static int os_date (lua_State *L) { + const char *s = luaL_optstring(L, 1, "%c"); + time_t t = luaL_opt(L, (time_t)luaL_checknumber, 2, time(NULL)); + struct tm tmr, *stm; + if (*s == '!') { /* UTC? */ + stm = l_gmtime(&t, &tmr); + s++; /* skip `!' */ + } + else + stm = l_localtime(&t, &tmr); + if (stm == NULL) /* invalid date? */ + lua_pushnil(L); + else if (strcmp(s, "*t") == 0) { + lua_createtable(L, 0, 9); /* 9 = number of fields */ + setfield(L, "sec", stm->tm_sec); + setfield(L, "min", stm->tm_min); + setfield(L, "hour", stm->tm_hour); + setfield(L, "day", stm->tm_mday); + setfield(L, "month", stm->tm_mon+1); + setfield(L, "year", stm->tm_year+1900); + setfield(L, "wday", stm->tm_wday+1); + setfield(L, "yday", stm->tm_yday+1); + setboolfield(L, "isdst", stm->tm_isdst); + } + else { + char cc[4]; + luaL_Buffer b; + cc[0] = '%'; + luaL_buffinit(L, &b); + while (*s) { + if (*s != '%') /* no conversion specifier? */ + luaL_addchar(&b, *s++); + else { + size_t reslen; + char buff[200]; /* should be big enough for any conversion result */ + s = checkoption(L, s + 1, cc); + reslen = strftime(buff, sizeof(buff), cc, stm); + luaL_addlstring(&b, buff, reslen); + } + } + luaL_pushresult(&b); + } + return 1; +} + + +static int os_time (lua_State *L) { + time_t t; + if (lua_isnoneornil(L, 1)) /* called without args? */ + t = time(NULL); /* get current time */ + else { + struct tm ts; + luaL_checktype(L, 1, LUA_TTABLE); + lua_settop(L, 1); /* make sure table is at the top */ + ts.tm_sec = getfield(L, "sec", 0); + ts.tm_min = getfield(L, "min", 0); + ts.tm_hour = getfield(L, "hour", 12); + ts.tm_mday = getfield(L, "day", -1); + ts.tm_mon = getfield(L, "month", -1) - 1; + ts.tm_year = getfield(L, "year", -1) - 1900; + ts.tm_isdst = getboolfield(L, "isdst"); + t = mktime(&ts); + } + if (t == (time_t)(-1)) + lua_pushnil(L); + else + lua_pushnumber(L, (lua_Number)t); + return 1; +} + + +static int os_difftime (lua_State *L) { + lua_pushnumber(L, difftime((time_t)(luaL_checknumber(L, 1)), + (time_t)(luaL_optnumber(L, 2, 0)))); + return 1; +} + +/* }====================================================== */ + + +static int os_setlocale (lua_State *L) { + static const int cat[] = {LC_ALL, LC_COLLATE, LC_CTYPE, LC_MONETARY, + LC_NUMERIC, LC_TIME}; + static const char *const catnames[] = {"all", "collate", "ctype", "monetary", + "numeric", "time", NULL}; + const char *l = luaL_optstring(L, 1, NULL); + int op = luaL_checkoption(L, 2, "all", catnames); + lua_pushstring(L, setlocale(cat[op], l)); + return 1; +} + + +static int os_exit (lua_State *L) { + int status; + if (lua_isboolean(L, 1)) + status = (lua_toboolean(L, 1) ? EXIT_SUCCESS : EXIT_FAILURE); + else + status = luaL_optint(L, 1, EXIT_SUCCESS); + if (lua_toboolean(L, 2)) + lua_close(L); + if (L) exit(status); /* 'if' to avoid warnings for unreachable 'return' */ + return 0; +} + + +static const luaL_Reg syslib[] = { + {"clock", os_clock}, + {"date", os_date}, + {"difftime", os_difftime}, + {"execute", os_execute}, + {"exit", os_exit}, + {"getenv", os_getenv}, + {"remove", os_remove}, + {"rename", os_rename}, + {"setlocale", os_setlocale}, + {"time", os_time}, + {"tmpname", os_tmpname}, + {NULL, NULL} +}; + +/* }====================================================== */ + + + +LUAMOD_API int luaopen_os (lua_State *L) { + luaL_newlib(L, syslib); + return 1; +} + diff --git a/ext/lua/src/lparser.c b/ext/lua/src/lparser.c new file mode 100644 index 000000000..d8f5b4ffc --- /dev/null +++ b/ext/lua/src/lparser.c @@ -0,0 +1,1638 @@ +/* +** $Id: lparser.c,v 2.130 2013/02/06 13:37:39 roberto Exp $ +** Lua Parser +** See Copyright Notice in lua.h +*/ + + +#include + +#define lparser_c +#define LUA_CORE + +#include "lua.h" + +#include "lcode.h" +#include "ldebug.h" +#include "ldo.h" +#include "lfunc.h" +#include "llex.h" +#include "lmem.h" +#include "lobject.h" +#include "lopcodes.h" +#include "lparser.h" +#include "lstate.h" +#include "lstring.h" +#include "ltable.h" + + + +/* maximum number of local variables per function (must be smaller + than 250, due to the bytecode format) */ +#define MAXVARS 200 + + +#define hasmultret(k) ((k) == VCALL || (k) == VVARARG) + + + +/* +** nodes for block list (list of active blocks) +*/ +typedef struct BlockCnt { + struct BlockCnt *previous; /* chain */ + short firstlabel; /* index of first label in this block */ + short firstgoto; /* index of first pending goto in this block */ + lu_byte nactvar; /* # active locals outside the block */ + lu_byte upval; /* true if some variable in the block is an upvalue */ + lu_byte isloop; /* true if `block' is a loop */ +} BlockCnt; + + + +/* +** prototypes for recursive non-terminal functions +*/ +static void statement (LexState *ls); +static void expr (LexState *ls, expdesc *v); + + +static void anchor_token (LexState *ls) { + /* last token from outer function must be EOS */ + lua_assert(ls->fs != NULL || ls->t.token == TK_EOS); + if (ls->t.token == TK_NAME || ls->t.token == TK_STRING) { + TString *ts = ls->t.seminfo.ts; + luaX_newstring(ls, getstr(ts), ts->tsv.len); + } +} + + +/* semantic error */ +static l_noret semerror (LexState *ls, const char *msg) { + ls->t.token = 0; /* remove 'near to' from final message */ + luaX_syntaxerror(ls, msg); +} + + +static l_noret error_expected (LexState *ls, int token) { + luaX_syntaxerror(ls, + luaO_pushfstring(ls->L, "%s expected", luaX_token2str(ls, token))); +} + + +static l_noret errorlimit (FuncState *fs, int limit, const char *what) { + lua_State *L = fs->ls->L; + const char *msg; + int line = fs->f->linedefined; + const char *where = (line == 0) + ? "main function" + : luaO_pushfstring(L, "function at line %d", line); + msg = luaO_pushfstring(L, "too many %s (limit is %d) in %s", + what, limit, where); + luaX_syntaxerror(fs->ls, msg); +} + + +static void checklimit (FuncState *fs, int v, int l, const char *what) { + if (v > l) errorlimit(fs, l, what); +} + + +static int testnext (LexState *ls, int c) { + if (ls->t.token == c) { + luaX_next(ls); + return 1; + } + else return 0; +} + + +static void check (LexState *ls, int c) { + if (ls->t.token != c) + error_expected(ls, c); +} + + +static void checknext (LexState *ls, int c) { + check(ls, c); + luaX_next(ls); +} + + +#define check_condition(ls,c,msg) { if (!(c)) luaX_syntaxerror(ls, msg); } + + + +static void check_match (LexState *ls, int what, int who, int where) { + if (!testnext(ls, what)) { + if (where == ls->linenumber) + error_expected(ls, what); + else { + luaX_syntaxerror(ls, luaO_pushfstring(ls->L, + "%s expected (to close %s at line %d)", + luaX_token2str(ls, what), luaX_token2str(ls, who), where)); + } + } +} + + +static TString *str_checkname (LexState *ls) { + TString *ts; + check(ls, TK_NAME); + ts = ls->t.seminfo.ts; + luaX_next(ls); + return ts; +} + + +static void init_exp (expdesc *e, expkind k, int i) { + e->f = e->t = NO_JUMP; + e->k = k; + e->u.info = i; +} + + +static void codestring (LexState *ls, expdesc *e, TString *s) { + init_exp(e, VK, luaK_stringK(ls->fs, s)); +} + + +static void checkname (LexState *ls, expdesc *e) { + codestring(ls, e, str_checkname(ls)); +} + + +static int registerlocalvar (LexState *ls, TString *varname) { + FuncState *fs = ls->fs; + Proto *f = fs->f; + int oldsize = f->sizelocvars; + luaM_growvector(ls->L, f->locvars, fs->nlocvars, f->sizelocvars, + LocVar, SHRT_MAX, "local variables"); + while (oldsize < f->sizelocvars) f->locvars[oldsize++].varname = NULL; + f->locvars[fs->nlocvars].varname = varname; + luaC_objbarrier(ls->L, f, varname); + return fs->nlocvars++; +} + + +static void new_localvar (LexState *ls, TString *name) { + FuncState *fs = ls->fs; + Dyndata *dyd = ls->dyd; + int reg = registerlocalvar(ls, name); + checklimit(fs, dyd->actvar.n + 1 - fs->firstlocal, + MAXVARS, "local variables"); + luaM_growvector(ls->L, dyd->actvar.arr, dyd->actvar.n + 1, + dyd->actvar.size, Vardesc, MAX_INT, "local variables"); + dyd->actvar.arr[dyd->actvar.n++].idx = cast(short, reg); +} + + +static void new_localvarliteral_ (LexState *ls, const char *name, size_t sz) { + new_localvar(ls, luaX_newstring(ls, name, sz)); +} + +#define new_localvarliteral(ls,v) \ + new_localvarliteral_(ls, "" v, (sizeof(v)/sizeof(char))-1) + + +static LocVar *getlocvar (FuncState *fs, int i) { + int idx = fs->ls->dyd->actvar.arr[fs->firstlocal + i].idx; + lua_assert(idx < fs->nlocvars); + return &fs->f->locvars[idx]; +} + + +static void adjustlocalvars (LexState *ls, int nvars) { + FuncState *fs = ls->fs; + fs->nactvar = cast_byte(fs->nactvar + nvars); + for (; nvars; nvars--) { + getlocvar(fs, fs->nactvar - nvars)->startpc = fs->pc; + } +} + + +static void removevars (FuncState *fs, int tolevel) { + fs->ls->dyd->actvar.n -= (fs->nactvar - tolevel); + while (fs->nactvar > tolevel) + getlocvar(fs, --fs->nactvar)->endpc = fs->pc; +} + + +static int searchupvalue (FuncState *fs, TString *name) { + int i; + Upvaldesc *up = fs->f->upvalues; + for (i = 0; i < fs->nups; i++) { + if (luaS_eqstr(up[i].name, name)) return i; + } + return -1; /* not found */ +} + + +static int newupvalue (FuncState *fs, TString *name, expdesc *v) { + Proto *f = fs->f; + int oldsize = f->sizeupvalues; + checklimit(fs, fs->nups + 1, MAXUPVAL, "upvalues"); + luaM_growvector(fs->ls->L, f->upvalues, fs->nups, f->sizeupvalues, + Upvaldesc, MAXUPVAL, "upvalues"); + while (oldsize < f->sizeupvalues) f->upvalues[oldsize++].name = NULL; + f->upvalues[fs->nups].instack = (v->k == VLOCAL); + f->upvalues[fs->nups].idx = cast_byte(v->u.info); + f->upvalues[fs->nups].name = name; + luaC_objbarrier(fs->ls->L, f, name); + return fs->nups++; +} + + +static int searchvar (FuncState *fs, TString *n) { + int i; + for (i = cast_int(fs->nactvar) - 1; i >= 0; i--) { + if (luaS_eqstr(n, getlocvar(fs, i)->varname)) + return i; + } + return -1; /* not found */ +} + + +/* + Mark block where variable at given level was defined + (to emit close instructions later). +*/ +static void markupval (FuncState *fs, int level) { + BlockCnt *bl = fs->bl; + while (bl->nactvar > level) bl = bl->previous; + bl->upval = 1; +} + + +/* + Find variable with given name 'n'. If it is an upvalue, add this + upvalue into all intermediate functions. +*/ +static int singlevaraux (FuncState *fs, TString *n, expdesc *var, int base) { + if (fs == NULL) /* no more levels? */ + return VVOID; /* default is global */ + else { + int v = searchvar(fs, n); /* look up locals at current level */ + if (v >= 0) { /* found? */ + init_exp(var, VLOCAL, v); /* variable is local */ + if (!base) + markupval(fs, v); /* local will be used as an upval */ + return VLOCAL; + } + else { /* not found as local at current level; try upvalues */ + int idx = searchupvalue(fs, n); /* try existing upvalues */ + if (idx < 0) { /* not found? */ + if (singlevaraux(fs->prev, n, var, 0) == VVOID) /* try upper levels */ + return VVOID; /* not found; is a global */ + /* else was LOCAL or UPVAL */ + idx = newupvalue(fs, n, var); /* will be a new upvalue */ + } + init_exp(var, VUPVAL, idx); + return VUPVAL; + } + } +} + + +static void singlevar (LexState *ls, expdesc *var) { + TString *varname = str_checkname(ls); + FuncState *fs = ls->fs; + if (singlevaraux(fs, varname, var, 1) == VVOID) { /* global name? */ + expdesc key; + singlevaraux(fs, ls->envn, var, 1); /* get environment variable */ + lua_assert(var->k == VLOCAL || var->k == VUPVAL); + codestring(ls, &key, varname); /* key is variable name */ + luaK_indexed(fs, var, &key); /* env[varname] */ + } +} + + +static void adjust_assign (LexState *ls, int nvars, int nexps, expdesc *e) { + FuncState *fs = ls->fs; + int extra = nvars - nexps; + if (hasmultret(e->k)) { + extra++; /* includes call itself */ + if (extra < 0) extra = 0; + luaK_setreturns(fs, e, extra); /* last exp. provides the difference */ + if (extra > 1) luaK_reserveregs(fs, extra-1); + } + else { + if (e->k != VVOID) luaK_exp2nextreg(fs, e); /* close last expression */ + if (extra > 0) { + int reg = fs->freereg; + luaK_reserveregs(fs, extra); + luaK_nil(fs, reg, extra); + } + } +} + + +static void enterlevel (LexState *ls) { + lua_State *L = ls->L; + ++L->nCcalls; + checklimit(ls->fs, L->nCcalls, LUAI_MAXCCALLS, "C levels"); +} + + +#define leavelevel(ls) ((ls)->L->nCcalls--) + + +static void closegoto (LexState *ls, int g, Labeldesc *label) { + int i; + FuncState *fs = ls->fs; + Labellist *gl = &ls->dyd->gt; + Labeldesc *gt = &gl->arr[g]; + lua_assert(luaS_eqstr(gt->name, label->name)); + if (gt->nactvar < label->nactvar) { + TString *vname = getlocvar(fs, gt->nactvar)->varname; + const char *msg = luaO_pushfstring(ls->L, + " at line %d jumps into the scope of local " LUA_QS, + getstr(gt->name), gt->line, getstr(vname)); + semerror(ls, msg); + } + luaK_patchlist(fs, gt->pc, label->pc); + /* remove goto from pending list */ + for (i = g; i < gl->n - 1; i++) + gl->arr[i] = gl->arr[i + 1]; + gl->n--; +} + + +/* +** try to close a goto with existing labels; this solves backward jumps +*/ +static int findlabel (LexState *ls, int g) { + int i; + BlockCnt *bl = ls->fs->bl; + Dyndata *dyd = ls->dyd; + Labeldesc *gt = &dyd->gt.arr[g]; + /* check labels in current block for a match */ + for (i = bl->firstlabel; i < dyd->label.n; i++) { + Labeldesc *lb = &dyd->label.arr[i]; + if (luaS_eqstr(lb->name, gt->name)) { /* correct label? */ + if (gt->nactvar > lb->nactvar && + (bl->upval || dyd->label.n > bl->firstlabel)) + luaK_patchclose(ls->fs, gt->pc, lb->nactvar); + closegoto(ls, g, lb); /* close it */ + return 1; + } + } + return 0; /* label not found; cannot close goto */ +} + + +static int newlabelentry (LexState *ls, Labellist *l, TString *name, + int line, int pc) { + int n = l->n; + luaM_growvector(ls->L, l->arr, n, l->size, + Labeldesc, SHRT_MAX, "labels/gotos"); + l->arr[n].name = name; + l->arr[n].line = line; + l->arr[n].nactvar = ls->fs->nactvar; + l->arr[n].pc = pc; + l->n++; + return n; +} + + +/* +** check whether new label 'lb' matches any pending gotos in current +** block; solves forward jumps +*/ +static void findgotos (LexState *ls, Labeldesc *lb) { + Labellist *gl = &ls->dyd->gt; + int i = ls->fs->bl->firstgoto; + while (i < gl->n) { + if (luaS_eqstr(gl->arr[i].name, lb->name)) + closegoto(ls, i, lb); + else + i++; + } +} + + +/* +** "export" pending gotos to outer level, to check them against +** outer labels; if the block being exited has upvalues, and +** the goto exits the scope of any variable (which can be the +** upvalue), close those variables being exited. +*/ +static void movegotosout (FuncState *fs, BlockCnt *bl) { + int i = bl->firstgoto; + Labellist *gl = &fs->ls->dyd->gt; + /* correct pending gotos to current block and try to close it + with visible labels */ + while (i < gl->n) { + Labeldesc *gt = &gl->arr[i]; + if (gt->nactvar > bl->nactvar) { + if (bl->upval) + luaK_patchclose(fs, gt->pc, bl->nactvar); + gt->nactvar = bl->nactvar; + } + if (!findlabel(fs->ls, i)) + i++; /* move to next one */ + } +} + + +static void enterblock (FuncState *fs, BlockCnt *bl, lu_byte isloop) { + bl->isloop = isloop; + bl->nactvar = fs->nactvar; + bl->firstlabel = fs->ls->dyd->label.n; + bl->firstgoto = fs->ls->dyd->gt.n; + bl->upval = 0; + bl->previous = fs->bl; + fs->bl = bl; + lua_assert(fs->freereg == fs->nactvar); +} + + +/* +** create a label named "break" to resolve break statements +*/ +static void breaklabel (LexState *ls) { + TString *n = luaS_new(ls->L, "break"); + int l = newlabelentry(ls, &ls->dyd->label, n, 0, ls->fs->pc); + findgotos(ls, &ls->dyd->label.arr[l]); +} + +/* +** generates an error for an undefined 'goto'; choose appropriate +** message when label name is a reserved word (which can only be 'break') +*/ +static l_noret undefgoto (LexState *ls, Labeldesc *gt) { + const char *msg = isreserved(gt->name) + ? "<%s> at line %d not inside a loop" + : "no visible label " LUA_QS " for at line %d"; + msg = luaO_pushfstring(ls->L, msg, getstr(gt->name), gt->line); + semerror(ls, msg); +} + + +static void leaveblock (FuncState *fs) { + BlockCnt *bl = fs->bl; + LexState *ls = fs->ls; + if (bl->previous && bl->upval) { + /* create a 'jump to here' to close upvalues */ + int j = luaK_jump(fs); + luaK_patchclose(fs, j, bl->nactvar); + luaK_patchtohere(fs, j); + } + if (bl->isloop) + breaklabel(ls); /* close pending breaks */ + fs->bl = bl->previous; + removevars(fs, bl->nactvar); + lua_assert(bl->nactvar == fs->nactvar); + fs->freereg = fs->nactvar; /* free registers */ + ls->dyd->label.n = bl->firstlabel; /* remove local labels */ + if (bl->previous) /* inner block? */ + movegotosout(fs, bl); /* update pending gotos to outer block */ + else if (bl->firstgoto < ls->dyd->gt.n) /* pending gotos in outer block? */ + undefgoto(ls, &ls->dyd->gt.arr[bl->firstgoto]); /* error */ +} + + +/* +** adds a new prototype into list of prototypes +*/ +static Proto *addprototype (LexState *ls) { + Proto *clp; + lua_State *L = ls->L; + FuncState *fs = ls->fs; + Proto *f = fs->f; /* prototype of current function */ + if (fs->np >= f->sizep) { + int oldsize = f->sizep; + luaM_growvector(L, f->p, fs->np, f->sizep, Proto *, MAXARG_Bx, "functions"); + while (oldsize < f->sizep) f->p[oldsize++] = NULL; + } + f->p[fs->np++] = clp = luaF_newproto(L); + luaC_objbarrier(L, f, clp); + return clp; +} + + +/* +** codes instruction to create new closure in parent function. +** The OP_CLOSURE instruction must use the last available register, +** so that, if it invokes the GC, the GC knows which registers +** are in use at that time. +*/ +static void codeclosure (LexState *ls, expdesc *v) { + FuncState *fs = ls->fs->prev; + init_exp(v, VRELOCABLE, luaK_codeABx(fs, OP_CLOSURE, 0, fs->np - 1)); + luaK_exp2nextreg(fs, v); /* fix it at the last register */ +} + + +static void open_func (LexState *ls, FuncState *fs, BlockCnt *bl) { + lua_State *L = ls->L; + Proto *f; + fs->prev = ls->fs; /* linked list of funcstates */ + fs->ls = ls; + ls->fs = fs; + fs->pc = 0; + fs->lasttarget = 0; + fs->jpc = NO_JUMP; + fs->freereg = 0; + fs->nk = 0; + fs->np = 0; + fs->nups = 0; + fs->nlocvars = 0; + fs->nactvar = 0; + fs->firstlocal = ls->dyd->actvar.n; + fs->bl = NULL; + f = fs->f; + f->source = ls->source; + f->maxstacksize = 2; /* registers 0/1 are always valid */ + fs->h = luaH_new(L); + /* anchor table of constants (to avoid being collected) */ + sethvalue2s(L, L->top, fs->h); + incr_top(L); + enterblock(fs, bl, 0); +} + + +static void close_func (LexState *ls) { + lua_State *L = ls->L; + FuncState *fs = ls->fs; + Proto *f = fs->f; + luaK_ret(fs, 0, 0); /* final return */ + leaveblock(fs); + luaM_reallocvector(L, f->code, f->sizecode, fs->pc, Instruction); + f->sizecode = fs->pc; + luaM_reallocvector(L, f->lineinfo, f->sizelineinfo, fs->pc, int); + f->sizelineinfo = fs->pc; + luaM_reallocvector(L, f->k, f->sizek, fs->nk, TValue); + f->sizek = fs->nk; + luaM_reallocvector(L, f->p, f->sizep, fs->np, Proto *); + f->sizep = fs->np; + luaM_reallocvector(L, f->locvars, f->sizelocvars, fs->nlocvars, LocVar); + f->sizelocvars = fs->nlocvars; + luaM_reallocvector(L, f->upvalues, f->sizeupvalues, fs->nups, Upvaldesc); + f->sizeupvalues = fs->nups; + lua_assert(fs->bl == NULL); + ls->fs = fs->prev; + /* last token read was anchored in defunct function; must re-anchor it */ + anchor_token(ls); + L->top--; /* pop table of constants */ + luaC_checkGC(L); +} + + + +/*============================================================*/ +/* GRAMMAR RULES */ +/*============================================================*/ + + +/* +** check whether current token is in the follow set of a block. +** 'until' closes syntactical blocks, but do not close scope, +** so it handled in separate. +*/ +static int block_follow (LexState *ls, int withuntil) { + switch (ls->t.token) { + case TK_ELSE: case TK_ELSEIF: + case TK_END: case TK_EOS: + return 1; + case TK_UNTIL: return withuntil; + default: return 0; + } +} + + +static void statlist (LexState *ls) { + /* statlist -> { stat [`;'] } */ + while (!block_follow(ls, 1)) { + if (ls->t.token == TK_RETURN) { + statement(ls); + return; /* 'return' must be last statement */ + } + statement(ls); + } +} + + +static void fieldsel (LexState *ls, expdesc *v) { + /* fieldsel -> ['.' | ':'] NAME */ + FuncState *fs = ls->fs; + expdesc key; + luaK_exp2anyregup(fs, v); + luaX_next(ls); /* skip the dot or colon */ + checkname(ls, &key); + luaK_indexed(fs, v, &key); +} + + +static void yindex (LexState *ls, expdesc *v) { + /* index -> '[' expr ']' */ + luaX_next(ls); /* skip the '[' */ + expr(ls, v); + luaK_exp2val(ls->fs, v); + checknext(ls, ']'); +} + + +/* +** {====================================================================== +** Rules for Constructors +** ======================================================================= +*/ + + +struct ConsControl { + expdesc v; /* last list item read */ + expdesc *t; /* table descriptor */ + int nh; /* total number of `record' elements */ + int na; /* total number of array elements */ + int tostore; /* number of array elements pending to be stored */ +}; + + +static void recfield (LexState *ls, struct ConsControl *cc) { + /* recfield -> (NAME | `['exp1`]') = exp1 */ + FuncState *fs = ls->fs; + int reg = ls->fs->freereg; + expdesc key, val; + int rkkey; + if (ls->t.token == TK_NAME) { + checklimit(fs, cc->nh, MAX_INT, "items in a constructor"); + checkname(ls, &key); + } + else /* ls->t.token == '[' */ + yindex(ls, &key); + cc->nh++; + checknext(ls, '='); + rkkey = luaK_exp2RK(fs, &key); + expr(ls, &val); + luaK_codeABC(fs, OP_SETTABLE, cc->t->u.info, rkkey, luaK_exp2RK(fs, &val)); + fs->freereg = reg; /* free registers */ +} + + +static void closelistfield (FuncState *fs, struct ConsControl *cc) { + if (cc->v.k == VVOID) return; /* there is no list item */ + luaK_exp2nextreg(fs, &cc->v); + cc->v.k = VVOID; + if (cc->tostore == LFIELDS_PER_FLUSH) { + luaK_setlist(fs, cc->t->u.info, cc->na, cc->tostore); /* flush */ + cc->tostore = 0; /* no more items pending */ + } +} + + +static void lastlistfield (FuncState *fs, struct ConsControl *cc) { + if (cc->tostore == 0) return; + if (hasmultret(cc->v.k)) { + luaK_setmultret(fs, &cc->v); + luaK_setlist(fs, cc->t->u.info, cc->na, LUA_MULTRET); + cc->na--; /* do not count last expression (unknown number of elements) */ + } + else { + if (cc->v.k != VVOID) + luaK_exp2nextreg(fs, &cc->v); + luaK_setlist(fs, cc->t->u.info, cc->na, cc->tostore); + } +} + + +static void listfield (LexState *ls, struct ConsControl *cc) { + /* listfield -> exp */ + expr(ls, &cc->v); + checklimit(ls->fs, cc->na, MAX_INT, "items in a constructor"); + cc->na++; + cc->tostore++; +} + + +static void field (LexState *ls, struct ConsControl *cc) { + /* field -> listfield | recfield */ + switch(ls->t.token) { + case TK_NAME: { /* may be 'listfield' or 'recfield' */ + if (luaX_lookahead(ls) != '=') /* expression? */ + listfield(ls, cc); + else + recfield(ls, cc); + break; + } + case '[': { + recfield(ls, cc); + break; + } + default: { + listfield(ls, cc); + break; + } + } +} + + +static void constructor (LexState *ls, expdesc *t) { + /* constructor -> '{' [ field { sep field } [sep] ] '}' + sep -> ',' | ';' */ + FuncState *fs = ls->fs; + int line = ls->linenumber; + int pc = luaK_codeABC(fs, OP_NEWTABLE, 0, 0, 0); + struct ConsControl cc; + cc.na = cc.nh = cc.tostore = 0; + cc.t = t; + init_exp(t, VRELOCABLE, pc); + init_exp(&cc.v, VVOID, 0); /* no value (yet) */ + luaK_exp2nextreg(ls->fs, t); /* fix it at stack top */ + checknext(ls, '{'); + do { + lua_assert(cc.v.k == VVOID || cc.tostore > 0); + if (ls->t.token == '}') break; + closelistfield(fs, &cc); + field(ls, &cc); + } while (testnext(ls, ',') || testnext(ls, ';')); + check_match(ls, '}', '{', line); + lastlistfield(fs, &cc); + SETARG_B(fs->f->code[pc], luaO_int2fb(cc.na)); /* set initial array size */ + SETARG_C(fs->f->code[pc], luaO_int2fb(cc.nh)); /* set initial table size */ +} + +/* }====================================================================== */ + + + +static void parlist (LexState *ls) { + /* parlist -> [ param { `,' param } ] */ + FuncState *fs = ls->fs; + Proto *f = fs->f; + int nparams = 0; + f->is_vararg = 0; + if (ls->t.token != ')') { /* is `parlist' not empty? */ + do { + switch (ls->t.token) { + case TK_NAME: { /* param -> NAME */ + new_localvar(ls, str_checkname(ls)); + nparams++; + break; + } + case TK_DOTS: { /* param -> `...' */ + luaX_next(ls); + f->is_vararg = 1; + break; + } + default: luaX_syntaxerror(ls, " or " LUA_QL("...") " expected"); + } + } while (!f->is_vararg && testnext(ls, ',')); + } + adjustlocalvars(ls, nparams); + f->numparams = cast_byte(fs->nactvar); + luaK_reserveregs(fs, fs->nactvar); /* reserve register for parameters */ +} + + +static void body (LexState *ls, expdesc *e, int ismethod, int line) { + /* body -> `(' parlist `)' block END */ + FuncState new_fs; + BlockCnt bl; + new_fs.f = addprototype(ls); + new_fs.f->linedefined = line; + open_func(ls, &new_fs, &bl); + checknext(ls, '('); + if (ismethod) { + new_localvarliteral(ls, "self"); /* create 'self' parameter */ + adjustlocalvars(ls, 1); + } + parlist(ls); + checknext(ls, ')'); + statlist(ls); + new_fs.f->lastlinedefined = ls->linenumber; + check_match(ls, TK_END, TK_FUNCTION, line); + codeclosure(ls, e); + close_func(ls); +} + + +static int explist (LexState *ls, expdesc *v) { + /* explist -> expr { `,' expr } */ + int n = 1; /* at least one expression */ + expr(ls, v); + while (testnext(ls, ',')) { + luaK_exp2nextreg(ls->fs, v); + expr(ls, v); + n++; + } + return n; +} + + +static void funcargs (LexState *ls, expdesc *f, int line) { + FuncState *fs = ls->fs; + expdesc args; + int base, nparams; + switch (ls->t.token) { + case '(': { /* funcargs -> `(' [ explist ] `)' */ + luaX_next(ls); + if (ls->t.token == ')') /* arg list is empty? */ + args.k = VVOID; + else { + explist(ls, &args); + luaK_setmultret(fs, &args); + } + check_match(ls, ')', '(', line); + break; + } + case '{': { /* funcargs -> constructor */ + constructor(ls, &args); + break; + } + case TK_STRING: { /* funcargs -> STRING */ + codestring(ls, &args, ls->t.seminfo.ts); + luaX_next(ls); /* must use `seminfo' before `next' */ + break; + } + default: { + luaX_syntaxerror(ls, "function arguments expected"); + } + } + lua_assert(f->k == VNONRELOC); + base = f->u.info; /* base register for call */ + if (hasmultret(args.k)) + nparams = LUA_MULTRET; /* open call */ + else { + if (args.k != VVOID) + luaK_exp2nextreg(fs, &args); /* close last argument */ + nparams = fs->freereg - (base+1); + } + init_exp(f, VCALL, luaK_codeABC(fs, OP_CALL, base, nparams+1, 2)); + luaK_fixline(fs, line); + fs->freereg = base+1; /* call remove function and arguments and leaves + (unless changed) one result */ +} + + + + +/* +** {====================================================================== +** Expression parsing +** ======================================================================= +*/ + + +static void primaryexp (LexState *ls, expdesc *v) { + /* primaryexp -> NAME | '(' expr ')' */ + switch (ls->t.token) { + case '(': { + int line = ls->linenumber; + luaX_next(ls); + expr(ls, v); + check_match(ls, ')', '(', line); + luaK_dischargevars(ls->fs, v); + return; + } + case TK_NAME: { + singlevar(ls, v); + return; + } + default: { + luaX_syntaxerror(ls, "unexpected symbol"); + } + } +} + + +static void suffixedexp (LexState *ls, expdesc *v) { + /* suffixedexp -> + primaryexp { '.' NAME | '[' exp ']' | ':' NAME funcargs | funcargs } */ + FuncState *fs = ls->fs; + int line = ls->linenumber; + primaryexp(ls, v); + for (;;) { + switch (ls->t.token) { + case '.': { /* fieldsel */ + fieldsel(ls, v); + break; + } + case '[': { /* `[' exp1 `]' */ + expdesc key; + luaK_exp2anyregup(fs, v); + yindex(ls, &key); + luaK_indexed(fs, v, &key); + break; + } + case ':': { /* `:' NAME funcargs */ + expdesc key; + luaX_next(ls); + checkname(ls, &key); + luaK_self(fs, v, &key); + funcargs(ls, v, line); + break; + } + case '(': case TK_STRING: case '{': { /* funcargs */ + luaK_exp2nextreg(fs, v); + funcargs(ls, v, line); + break; + } + default: return; + } + } +} + + +static void simpleexp (LexState *ls, expdesc *v) { + /* simpleexp -> NUMBER | STRING | NIL | TRUE | FALSE | ... | + constructor | FUNCTION body | suffixedexp */ + switch (ls->t.token) { + case TK_NUMBER: { + init_exp(v, VKNUM, 0); + v->u.nval = ls->t.seminfo.r; + break; + } + case TK_STRING: { + codestring(ls, v, ls->t.seminfo.ts); + break; + } + case TK_NIL: { + init_exp(v, VNIL, 0); + break; + } + case TK_TRUE: { + init_exp(v, VTRUE, 0); + break; + } + case TK_FALSE: { + init_exp(v, VFALSE, 0); + break; + } + case TK_DOTS: { /* vararg */ + FuncState *fs = ls->fs; + check_condition(ls, fs->f->is_vararg, + "cannot use " LUA_QL("...") " outside a vararg function"); + init_exp(v, VVARARG, luaK_codeABC(fs, OP_VARARG, 0, 1, 0)); + break; + } + case '{': { /* constructor */ + constructor(ls, v); + return; + } + case TK_FUNCTION: { + luaX_next(ls); + body(ls, v, 0, ls->linenumber); + return; + } + default: { + suffixedexp(ls, v); + return; + } + } + luaX_next(ls); +} + + +static UnOpr getunopr (int op) { + switch (op) { + case TK_NOT: return OPR_NOT; + case '-': return OPR_MINUS; + case '#': return OPR_LEN; + default: return OPR_NOUNOPR; + } +} + + +static BinOpr getbinopr (int op) { + switch (op) { + case '+': return OPR_ADD; + case '-': return OPR_SUB; + case '*': return OPR_MUL; + case '/': return OPR_DIV; + case '%': return OPR_MOD; + case '^': return OPR_POW; + case TK_CONCAT: return OPR_CONCAT; + case TK_NE: return OPR_NE; + case TK_EQ: return OPR_EQ; + case '<': return OPR_LT; + case TK_LE: return OPR_LE; + case '>': return OPR_GT; + case TK_GE: return OPR_GE; + case TK_AND: return OPR_AND; + case TK_OR: return OPR_OR; + default: return OPR_NOBINOPR; + } +} + + +static const struct { + lu_byte left; /* left priority for each binary operator */ + lu_byte right; /* right priority */ +} priority[] = { /* ORDER OPR */ + {6, 6}, {6, 6}, {7, 7}, {7, 7}, {7, 7}, /* `+' `-' `*' `/' `%' */ + {10, 9}, {5, 4}, /* ^, .. (right associative) */ + {3, 3}, {3, 3}, {3, 3}, /* ==, <, <= */ + {3, 3}, {3, 3}, {3, 3}, /* ~=, >, >= */ + {2, 2}, {1, 1} /* and, or */ +}; + +#define UNARY_PRIORITY 8 /* priority for unary operators */ + + +/* +** subexpr -> (simpleexp | unop subexpr) { binop subexpr } +** where `binop' is any binary operator with a priority higher than `limit' +*/ +static BinOpr subexpr (LexState *ls, expdesc *v, int limit) { + BinOpr op; + UnOpr uop; + enterlevel(ls); + uop = getunopr(ls->t.token); + if (uop != OPR_NOUNOPR) { + int line = ls->linenumber; + luaX_next(ls); + subexpr(ls, v, UNARY_PRIORITY); + luaK_prefix(ls->fs, uop, v, line); + } + else simpleexp(ls, v); + /* expand while operators have priorities higher than `limit' */ + op = getbinopr(ls->t.token); + while (op != OPR_NOBINOPR && priority[op].left > limit) { + expdesc v2; + BinOpr nextop; + int line = ls->linenumber; + luaX_next(ls); + luaK_infix(ls->fs, op, v); + /* read sub-expression with higher priority */ + nextop = subexpr(ls, &v2, priority[op].right); + luaK_posfix(ls->fs, op, v, &v2, line); + op = nextop; + } + leavelevel(ls); + return op; /* return first untreated operator */ +} + + +static void expr (LexState *ls, expdesc *v) { + subexpr(ls, v, 0); +} + +/* }==================================================================== */ + + + +/* +** {====================================================================== +** Rules for Statements +** ======================================================================= +*/ + + +static void block (LexState *ls) { + /* block -> statlist */ + FuncState *fs = ls->fs; + BlockCnt bl; + enterblock(fs, &bl, 0); + statlist(ls); + leaveblock(fs); +} + + +/* +** structure to chain all variables in the left-hand side of an +** assignment +*/ +struct LHS_assign { + struct LHS_assign *prev; + expdesc v; /* variable (global, local, upvalue, or indexed) */ +}; + + +/* +** check whether, in an assignment to an upvalue/local variable, the +** upvalue/local variable is begin used in a previous assignment to a +** table. If so, save original upvalue/local value in a safe place and +** use this safe copy in the previous assignment. +*/ +static void check_conflict (LexState *ls, struct LHS_assign *lh, expdesc *v) { + FuncState *fs = ls->fs; + int extra = fs->freereg; /* eventual position to save local variable */ + int conflict = 0; + for (; lh; lh = lh->prev) { /* check all previous assignments */ + if (lh->v.k == VINDEXED) { /* assigning to a table? */ + /* table is the upvalue/local being assigned now? */ + if (lh->v.u.ind.vt == v->k && lh->v.u.ind.t == v->u.info) { + conflict = 1; + lh->v.u.ind.vt = VLOCAL; + lh->v.u.ind.t = extra; /* previous assignment will use safe copy */ + } + /* index is the local being assigned? (index cannot be upvalue) */ + if (v->k == VLOCAL && lh->v.u.ind.idx == v->u.info) { + conflict = 1; + lh->v.u.ind.idx = extra; /* previous assignment will use safe copy */ + } + } + } + if (conflict) { + /* copy upvalue/local value to a temporary (in position 'extra') */ + OpCode op = (v->k == VLOCAL) ? OP_MOVE : OP_GETUPVAL; + luaK_codeABC(fs, op, extra, v->u.info, 0); + luaK_reserveregs(fs, 1); + } +} + + +static void assignment (LexState *ls, struct LHS_assign *lh, int nvars) { + expdesc e; + check_condition(ls, vkisvar(lh->v.k), "syntax error"); + if (testnext(ls, ',')) { /* assignment -> ',' suffixedexp assignment */ + struct LHS_assign nv; + nv.prev = lh; + suffixedexp(ls, &nv.v); + if (nv.v.k != VINDEXED) + check_conflict(ls, lh, &nv.v); + checklimit(ls->fs, nvars + ls->L->nCcalls, LUAI_MAXCCALLS, + "C levels"); + assignment(ls, &nv, nvars+1); + } + else { /* assignment -> `=' explist */ + int nexps; + checknext(ls, '='); + nexps = explist(ls, &e); + if (nexps != nvars) { + adjust_assign(ls, nvars, nexps, &e); + if (nexps > nvars) + ls->fs->freereg -= nexps - nvars; /* remove extra values */ + } + else { + luaK_setoneret(ls->fs, &e); /* close last expression */ + luaK_storevar(ls->fs, &lh->v, &e); + return; /* avoid default */ + } + } + init_exp(&e, VNONRELOC, ls->fs->freereg-1); /* default assignment */ + luaK_storevar(ls->fs, &lh->v, &e); +} + + +static int cond (LexState *ls) { + /* cond -> exp */ + expdesc v; + expr(ls, &v); /* read condition */ + if (v.k == VNIL) v.k = VFALSE; /* `falses' are all equal here */ + luaK_goiftrue(ls->fs, &v); + return v.f; +} + + +static void gotostat (LexState *ls, int pc) { + int line = ls->linenumber; + TString *label; + int g; + if (testnext(ls, TK_GOTO)) + label = str_checkname(ls); + else { + luaX_next(ls); /* skip break */ + label = luaS_new(ls->L, "break"); + } + g = newlabelentry(ls, &ls->dyd->gt, label, line, pc); + findlabel(ls, g); /* close it if label already defined */ +} + + +/* check for repeated labels on the same block */ +static void checkrepeated (FuncState *fs, Labellist *ll, TString *label) { + int i; + for (i = fs->bl->firstlabel; i < ll->n; i++) { + if (luaS_eqstr(label, ll->arr[i].name)) { + const char *msg = luaO_pushfstring(fs->ls->L, + "label " LUA_QS " already defined on line %d", + getstr(label), ll->arr[i].line); + semerror(fs->ls, msg); + } + } +} + + +/* skip no-op statements */ +static void skipnoopstat (LexState *ls) { + while (ls->t.token == ';' || ls->t.token == TK_DBCOLON) + statement(ls); +} + + +static void labelstat (LexState *ls, TString *label, int line) { + /* label -> '::' NAME '::' */ + FuncState *fs = ls->fs; + Labellist *ll = &ls->dyd->label; + int l; /* index of new label being created */ + checkrepeated(fs, ll, label); /* check for repeated labels */ + checknext(ls, TK_DBCOLON); /* skip double colon */ + /* create new entry for this label */ + l = newlabelentry(ls, ll, label, line, fs->pc); + skipnoopstat(ls); /* skip other no-op statements */ + if (block_follow(ls, 0)) { /* label is last no-op statement in the block? */ + /* assume that locals are already out of scope */ + ll->arr[l].nactvar = fs->bl->nactvar; + } + findgotos(ls, &ll->arr[l]); +} + + +static void whilestat (LexState *ls, int line) { + /* whilestat -> WHILE cond DO block END */ + FuncState *fs = ls->fs; + int whileinit; + int condexit; + BlockCnt bl; + luaX_next(ls); /* skip WHILE */ + whileinit = luaK_getlabel(fs); + condexit = cond(ls); + enterblock(fs, &bl, 1); + checknext(ls, TK_DO); + block(ls); + luaK_jumpto(fs, whileinit); + check_match(ls, TK_END, TK_WHILE, line); + leaveblock(fs); + luaK_patchtohere(fs, condexit); /* false conditions finish the loop */ +} + + +static void repeatstat (LexState *ls, int line) { + /* repeatstat -> REPEAT block UNTIL cond */ + int condexit; + FuncState *fs = ls->fs; + int repeat_init = luaK_getlabel(fs); + BlockCnt bl1, bl2; + enterblock(fs, &bl1, 1); /* loop block */ + enterblock(fs, &bl2, 0); /* scope block */ + luaX_next(ls); /* skip REPEAT */ + statlist(ls); + check_match(ls, TK_UNTIL, TK_REPEAT, line); + condexit = cond(ls); /* read condition (inside scope block) */ + if (bl2.upval) /* upvalues? */ + luaK_patchclose(fs, condexit, bl2.nactvar); + leaveblock(fs); /* finish scope */ + luaK_patchlist(fs, condexit, repeat_init); /* close the loop */ + leaveblock(fs); /* finish loop */ +} + + +static int exp1 (LexState *ls) { + expdesc e; + int reg; + expr(ls, &e); + luaK_exp2nextreg(ls->fs, &e); + lua_assert(e.k == VNONRELOC); + reg = e.u.info; + return reg; +} + + +static void forbody (LexState *ls, int base, int line, int nvars, int isnum) { + /* forbody -> DO block */ + BlockCnt bl; + FuncState *fs = ls->fs; + int prep, endfor; + adjustlocalvars(ls, 3); /* control variables */ + checknext(ls, TK_DO); + prep = isnum ? luaK_codeAsBx(fs, OP_FORPREP, base, NO_JUMP) : luaK_jump(fs); + enterblock(fs, &bl, 0); /* scope for declared variables */ + adjustlocalvars(ls, nvars); + luaK_reserveregs(fs, nvars); + block(ls); + leaveblock(fs); /* end of scope for declared variables */ + luaK_patchtohere(fs, prep); + if (isnum) /* numeric for? */ + endfor = luaK_codeAsBx(fs, OP_FORLOOP, base, NO_JUMP); + else { /* generic for */ + luaK_codeABC(fs, OP_TFORCALL, base, 0, nvars); + luaK_fixline(fs, line); + endfor = luaK_codeAsBx(fs, OP_TFORLOOP, base + 2, NO_JUMP); + } + luaK_patchlist(fs, endfor, prep + 1); + luaK_fixline(fs, line); +} + + +static void fornum (LexState *ls, TString *varname, int line) { + /* fornum -> NAME = exp1,exp1[,exp1] forbody */ + FuncState *fs = ls->fs; + int base = fs->freereg; + new_localvarliteral(ls, "(for index)"); + new_localvarliteral(ls, "(for limit)"); + new_localvarliteral(ls, "(for step)"); + new_localvar(ls, varname); + checknext(ls, '='); + exp1(ls); /* initial value */ + checknext(ls, ','); + exp1(ls); /* limit */ + if (testnext(ls, ',')) + exp1(ls); /* optional step */ + else { /* default step = 1 */ + luaK_codek(fs, fs->freereg, luaK_numberK(fs, 1)); + luaK_reserveregs(fs, 1); + } + forbody(ls, base, line, 1, 1); +} + + +static void forlist (LexState *ls, TString *indexname) { + /* forlist -> NAME {,NAME} IN explist forbody */ + FuncState *fs = ls->fs; + expdesc e; + int nvars = 4; /* gen, state, control, plus at least one declared var */ + int line; + int base = fs->freereg; + /* create control variables */ + new_localvarliteral(ls, "(for generator)"); + new_localvarliteral(ls, "(for state)"); + new_localvarliteral(ls, "(for control)"); + /* create declared variables */ + new_localvar(ls, indexname); + while (testnext(ls, ',')) { + new_localvar(ls, str_checkname(ls)); + nvars++; + } + checknext(ls, TK_IN); + line = ls->linenumber; + adjust_assign(ls, 3, explist(ls, &e), &e); + luaK_checkstack(fs, 3); /* extra space to call generator */ + forbody(ls, base, line, nvars - 3, 0); +} + + +static void forstat (LexState *ls, int line) { + /* forstat -> FOR (fornum | forlist) END */ + FuncState *fs = ls->fs; + TString *varname; + BlockCnt bl; + enterblock(fs, &bl, 1); /* scope for loop and control variables */ + luaX_next(ls); /* skip `for' */ + varname = str_checkname(ls); /* first variable name */ + switch (ls->t.token) { + case '=': fornum(ls, varname, line); break; + case ',': case TK_IN: forlist(ls, varname); break; + default: luaX_syntaxerror(ls, LUA_QL("=") " or " LUA_QL("in") " expected"); + } + check_match(ls, TK_END, TK_FOR, line); + leaveblock(fs); /* loop scope (`break' jumps to this point) */ +} + + +static void test_then_block (LexState *ls, int *escapelist) { + /* test_then_block -> [IF | ELSEIF] cond THEN block */ + BlockCnt bl; + FuncState *fs = ls->fs; + expdesc v; + int jf; /* instruction to skip 'then' code (if condition is false) */ + luaX_next(ls); /* skip IF or ELSEIF */ + expr(ls, &v); /* read condition */ + checknext(ls, TK_THEN); + if (ls->t.token == TK_GOTO || ls->t.token == TK_BREAK) { + luaK_goiffalse(ls->fs, &v); /* will jump to label if condition is true */ + enterblock(fs, &bl, 0); /* must enter block before 'goto' */ + gotostat(ls, v.t); /* handle goto/break */ + skipnoopstat(ls); /* skip other no-op statements */ + if (block_follow(ls, 0)) { /* 'goto' is the entire block? */ + leaveblock(fs); + return; /* and that is it */ + } + else /* must skip over 'then' part if condition is false */ + jf = luaK_jump(fs); + } + else { /* regular case (not goto/break) */ + luaK_goiftrue(ls->fs, &v); /* skip over block if condition is false */ + enterblock(fs, &bl, 0); + jf = v.f; + } + statlist(ls); /* `then' part */ + leaveblock(fs); + if (ls->t.token == TK_ELSE || + ls->t.token == TK_ELSEIF) /* followed by 'else'/'elseif'? */ + luaK_concat(fs, escapelist, luaK_jump(fs)); /* must jump over it */ + luaK_patchtohere(fs, jf); +} + + +static void ifstat (LexState *ls, int line) { + /* ifstat -> IF cond THEN block {ELSEIF cond THEN block} [ELSE block] END */ + FuncState *fs = ls->fs; + int escapelist = NO_JUMP; /* exit list for finished parts */ + test_then_block(ls, &escapelist); /* IF cond THEN block */ + while (ls->t.token == TK_ELSEIF) + test_then_block(ls, &escapelist); /* ELSEIF cond THEN block */ + if (testnext(ls, TK_ELSE)) + block(ls); /* `else' part */ + check_match(ls, TK_END, TK_IF, line); + luaK_patchtohere(fs, escapelist); /* patch escape list to 'if' end */ +} + + +static void localfunc (LexState *ls) { + expdesc b; + FuncState *fs = ls->fs; + new_localvar(ls, str_checkname(ls)); /* new local variable */ + adjustlocalvars(ls, 1); /* enter its scope */ + body(ls, &b, 0, ls->linenumber); /* function created in next register */ + /* debug information will only see the variable after this point! */ + getlocvar(fs, b.u.info)->startpc = fs->pc; +} + + +static void localstat (LexState *ls) { + /* stat -> LOCAL NAME {`,' NAME} [`=' explist] */ + int nvars = 0; + int nexps; + expdesc e; + do { + new_localvar(ls, str_checkname(ls)); + nvars++; + } while (testnext(ls, ',')); + if (testnext(ls, '=')) + nexps = explist(ls, &e); + else { + e.k = VVOID; + nexps = 0; + } + adjust_assign(ls, nvars, nexps, &e); + adjustlocalvars(ls, nvars); +} + + +static int funcname (LexState *ls, expdesc *v) { + /* funcname -> NAME {fieldsel} [`:' NAME] */ + int ismethod = 0; + singlevar(ls, v); + while (ls->t.token == '.') + fieldsel(ls, v); + if (ls->t.token == ':') { + ismethod = 1; + fieldsel(ls, v); + } + return ismethod; +} + + +static void funcstat (LexState *ls, int line) { + /* funcstat -> FUNCTION funcname body */ + int ismethod; + expdesc v, b; + luaX_next(ls); /* skip FUNCTION */ + ismethod = funcname(ls, &v); + body(ls, &b, ismethod, line); + luaK_storevar(ls->fs, &v, &b); + luaK_fixline(ls->fs, line); /* definition `happens' in the first line */ +} + + +static void exprstat (LexState *ls) { + /* stat -> func | assignment */ + FuncState *fs = ls->fs; + struct LHS_assign v; + suffixedexp(ls, &v.v); + if (ls->t.token == '=' || ls->t.token == ',') { /* stat -> assignment ? */ + v.prev = NULL; + assignment(ls, &v, 1); + } + else { /* stat -> func */ + check_condition(ls, v.v.k == VCALL, "syntax error"); + SETARG_C(getcode(fs, &v.v), 1); /* call statement uses no results */ + } +} + + +static void retstat (LexState *ls) { + /* stat -> RETURN [explist] [';'] */ + FuncState *fs = ls->fs; + expdesc e; + int first, nret; /* registers with returned values */ + if (block_follow(ls, 1) || ls->t.token == ';') + first = nret = 0; /* return no values */ + else { + nret = explist(ls, &e); /* optional return values */ + if (hasmultret(e.k)) { + luaK_setmultret(fs, &e); + if (e.k == VCALL && nret == 1) { /* tail call? */ + SET_OPCODE(getcode(fs,&e), OP_TAILCALL); + lua_assert(GETARG_A(getcode(fs,&e)) == fs->nactvar); + } + first = fs->nactvar; + nret = LUA_MULTRET; /* return all values */ + } + else { + if (nret == 1) /* only one single value? */ + first = luaK_exp2anyreg(fs, &e); + else { + luaK_exp2nextreg(fs, &e); /* values must go to the `stack' */ + first = fs->nactvar; /* return all `active' values */ + lua_assert(nret == fs->freereg - first); + } + } + } + luaK_ret(fs, first, nret); + testnext(ls, ';'); /* skip optional semicolon */ +} + + +static void statement (LexState *ls) { + int line = ls->linenumber; /* may be needed for error messages */ + enterlevel(ls); + switch (ls->t.token) { + case ';': { /* stat -> ';' (empty statement) */ + luaX_next(ls); /* skip ';' */ + break; + } + case TK_IF: { /* stat -> ifstat */ + ifstat(ls, line); + break; + } + case TK_WHILE: { /* stat -> whilestat */ + whilestat(ls, line); + break; + } + case TK_DO: { /* stat -> DO block END */ + luaX_next(ls); /* skip DO */ + block(ls); + check_match(ls, TK_END, TK_DO, line); + break; + } + case TK_FOR: { /* stat -> forstat */ + forstat(ls, line); + break; + } + case TK_REPEAT: { /* stat -> repeatstat */ + repeatstat(ls, line); + break; + } + case TK_FUNCTION: { /* stat -> funcstat */ + funcstat(ls, line); + break; + } + case TK_LOCAL: { /* stat -> localstat */ + luaX_next(ls); /* skip LOCAL */ + if (testnext(ls, TK_FUNCTION)) /* local function? */ + localfunc(ls); + else + localstat(ls); + break; + } + case TK_DBCOLON: { /* stat -> label */ + luaX_next(ls); /* skip double colon */ + labelstat(ls, str_checkname(ls), line); + break; + } + case TK_RETURN: { /* stat -> retstat */ + luaX_next(ls); /* skip RETURN */ + retstat(ls); + break; + } + case TK_BREAK: /* stat -> breakstat */ + case TK_GOTO: { /* stat -> 'goto' NAME */ + gotostat(ls, luaK_jump(ls->fs)); + break; + } + default: { /* stat -> func | assignment */ + exprstat(ls); + break; + } + } + lua_assert(ls->fs->f->maxstacksize >= ls->fs->freereg && + ls->fs->freereg >= ls->fs->nactvar); + ls->fs->freereg = ls->fs->nactvar; /* free registers */ + leavelevel(ls); +} + +/* }====================================================================== */ + + +/* +** compiles the main function, which is a regular vararg function with an +** upvalue named LUA_ENV +*/ +static void mainfunc (LexState *ls, FuncState *fs) { + BlockCnt bl; + expdesc v; + open_func(ls, fs, &bl); + fs->f->is_vararg = 1; /* main function is always vararg */ + init_exp(&v, VLOCAL, 0); /* create and... */ + newupvalue(fs, ls->envn, &v); /* ...set environment upvalue */ + luaX_next(ls); /* read first token */ + statlist(ls); /* parse main body */ + check(ls, TK_EOS); + close_func(ls); +} + + +Closure *luaY_parser (lua_State *L, ZIO *z, Mbuffer *buff, + Dyndata *dyd, const char *name, int firstchar) { + LexState lexstate; + FuncState funcstate; + Closure *cl = luaF_newLclosure(L, 1); /* create main closure */ + /* anchor closure (to avoid being collected) */ + setclLvalue(L, L->top, cl); + incr_top(L); + funcstate.f = cl->l.p = luaF_newproto(L); + funcstate.f->source = luaS_new(L, name); /* create and anchor TString */ + lexstate.buff = buff; + lexstate.dyd = dyd; + dyd->actvar.n = dyd->gt.n = dyd->label.n = 0; + luaX_setinput(L, &lexstate, z, funcstate.f->source, firstchar); + mainfunc(&lexstate, &funcstate); + lua_assert(!funcstate.prev && funcstate.nups == 1 && !lexstate.fs); + /* all scopes should be correctly finished */ + lua_assert(dyd->actvar.n == 0 && dyd->gt.n == 0 && dyd->label.n == 0); + return cl; /* it's on the stack too */ +} + diff --git a/ext/lua/src/lstate.c b/ext/lua/src/lstate.c new file mode 100644 index 000000000..207a106d5 --- /dev/null +++ b/ext/lua/src/lstate.c @@ -0,0 +1,322 @@ +/* +** $Id: lstate.c,v 2.99 2012/10/02 17:40:53 roberto Exp $ +** Global State +** See Copyright Notice in lua.h +*/ + + +#include +#include + +#define lstate_c +#define LUA_CORE + +#include "lua.h" + +#include "lapi.h" +#include "ldebug.h" +#include "ldo.h" +#include "lfunc.h" +#include "lgc.h" +#include "llex.h" +#include "lmem.h" +#include "lstate.h" +#include "lstring.h" +#include "ltable.h" +#include "ltm.h" + + +#if !defined(LUAI_GCPAUSE) +#define LUAI_GCPAUSE 200 /* 200% */ +#endif + +#if !defined(LUAI_GCMAJOR) +#define LUAI_GCMAJOR 200 /* 200% */ +#endif + +#if !defined(LUAI_GCMUL) +#define LUAI_GCMUL 200 /* GC runs 'twice the speed' of memory allocation */ +#endif + + +#define MEMERRMSG "not enough memory" + + +/* +** a macro to help the creation of a unique random seed when a state is +** created; the seed is used to randomize hashes. +*/ +#if !defined(luai_makeseed) +#include +#define luai_makeseed() cast(unsigned int, time(NULL)) +#endif + + + +/* +** thread state + extra space +*/ +typedef struct LX { +#if defined(LUAI_EXTRASPACE) + char buff[LUAI_EXTRASPACE]; +#endif + lua_State l; +} LX; + + +/* +** Main thread combines a thread state and the global state +*/ +typedef struct LG { + LX l; + global_State g; +} LG; + + + +#define fromstate(L) (cast(LX *, cast(lu_byte *, (L)) - offsetof(LX, l))) + + +/* +** Compute an initial seed as random as possible. In ANSI, rely on +** Address Space Layout Randomization (if present) to increase +** randomness.. +*/ +#define addbuff(b,p,e) \ + { size_t t = cast(size_t, e); \ + memcpy(buff + p, &t, sizeof(t)); p += sizeof(t); } + +static unsigned int makeseed (lua_State *L) { + char buff[4 * sizeof(size_t)]; + unsigned int h = luai_makeseed(); + int p = 0; + addbuff(buff, p, L); /* heap variable */ + addbuff(buff, p, &h); /* local variable */ + addbuff(buff, p, luaO_nilobject); /* global variable */ + addbuff(buff, p, &lua_newstate); /* public function */ + lua_assert(p == sizeof(buff)); + return luaS_hash(buff, p, h); +} + + +/* +** set GCdebt to a new value keeping the value (totalbytes + GCdebt) +** invariant +*/ +void luaE_setdebt (global_State *g, l_mem debt) { + g->totalbytes -= (debt - g->GCdebt); + g->GCdebt = debt; +} + + +CallInfo *luaE_extendCI (lua_State *L) { + CallInfo *ci = luaM_new(L, CallInfo); + lua_assert(L->ci->next == NULL); + L->ci->next = ci; + ci->previous = L->ci; + ci->next = NULL; + return ci; +} + + +void luaE_freeCI (lua_State *L) { + CallInfo *ci = L->ci; + CallInfo *next = ci->next; + ci->next = NULL; + while ((ci = next) != NULL) { + next = ci->next; + luaM_free(L, ci); + } +} + + +static void stack_init (lua_State *L1, lua_State *L) { + int i; CallInfo *ci; + /* initialize stack array */ + L1->stack = luaM_newvector(L, BASIC_STACK_SIZE, TValue); + L1->stacksize = BASIC_STACK_SIZE; + for (i = 0; i < BASIC_STACK_SIZE; i++) + setnilvalue(L1->stack + i); /* erase new stack */ + L1->top = L1->stack; + L1->stack_last = L1->stack + L1->stacksize - EXTRA_STACK; + /* initialize first ci */ + ci = &L1->base_ci; + ci->next = ci->previous = NULL; + ci->callstatus = 0; + ci->func = L1->top; + setnilvalue(L1->top++); /* 'function' entry for this 'ci' */ + ci->top = L1->top + LUA_MINSTACK; + L1->ci = ci; +} + + +static void freestack (lua_State *L) { + if (L->stack == NULL) + return; /* stack not completely built yet */ + L->ci = &L->base_ci; /* free the entire 'ci' list */ + luaE_freeCI(L); + luaM_freearray(L, L->stack, L->stacksize); /* free stack array */ +} + + +/* +** Create registry table and its predefined values +*/ +static void init_registry (lua_State *L, global_State *g) { + TValue mt; + /* create registry */ + Table *registry = luaH_new(L); + sethvalue(L, &g->l_registry, registry); + luaH_resize(L, registry, LUA_RIDX_LAST, 0); + /* registry[LUA_RIDX_MAINTHREAD] = L */ + setthvalue(L, &mt, L); + luaH_setint(L, registry, LUA_RIDX_MAINTHREAD, &mt); + /* registry[LUA_RIDX_GLOBALS] = table of globals */ + sethvalue(L, &mt, luaH_new(L)); + luaH_setint(L, registry, LUA_RIDX_GLOBALS, &mt); +} + + +/* +** open parts of the state that may cause memory-allocation errors +*/ +static void f_luaopen (lua_State *L, void *ud) { + global_State *g = G(L); + UNUSED(ud); + stack_init(L, L); /* init stack */ + init_registry(L, g); + luaS_resize(L, MINSTRTABSIZE); /* initial size of string table */ + luaT_init(L); + luaX_init(L); + /* pre-create memory-error message */ + g->memerrmsg = luaS_newliteral(L, MEMERRMSG); + luaS_fix(g->memerrmsg); /* it should never be collected */ + g->gcrunning = 1; /* allow gc */ +} + + +/* +** preinitialize a state with consistent values without allocating +** any memory (to avoid errors) +*/ +static void preinit_state (lua_State *L, global_State *g) { + G(L) = g; + L->stack = NULL; + L->ci = NULL; + L->stacksize = 0; + L->errorJmp = NULL; + L->nCcalls = 0; + L->hook = NULL; + L->hookmask = 0; + L->basehookcount = 0; + L->allowhook = 1; + resethookcount(L); + L->openupval = NULL; + L->nny = 1; + L->status = LUA_OK; + L->errfunc = 0; +} + + +static void close_state (lua_State *L) { + global_State *g = G(L); + luaF_close(L, L->stack); /* close all upvalues for this thread */ + luaC_freeallobjects(L); /* collect all objects */ + luaM_freearray(L, G(L)->strt.hash, G(L)->strt.size); + luaZ_freebuffer(L, &g->buff); + freestack(L); + lua_assert(gettotalbytes(g) == sizeof(LG)); + (*g->frealloc)(g->ud, fromstate(L), sizeof(LG), 0); /* free main block */ +} + + +LUA_API lua_State *lua_newthread (lua_State *L) { + lua_State *L1; + lua_lock(L); + luaC_checkGC(L); + L1 = &luaC_newobj(L, LUA_TTHREAD, sizeof(LX), NULL, offsetof(LX, l))->th; + setthvalue(L, L->top, L1); + api_incr_top(L); + preinit_state(L1, G(L)); + L1->hookmask = L->hookmask; + L1->basehookcount = L->basehookcount; + L1->hook = L->hook; + resethookcount(L1); + luai_userstatethread(L, L1); + stack_init(L1, L); /* init stack */ + lua_unlock(L); + return L1; +} + + +void luaE_freethread (lua_State *L, lua_State *L1) { + LX *l = fromstate(L1); + luaF_close(L1, L1->stack); /* close all upvalues for this thread */ + lua_assert(L1->openupval == NULL); + luai_userstatefree(L, L1); + freestack(L1); + luaM_free(L, l); +} + + +LUA_API lua_State *lua_newstate (lua_Alloc f, void *ud) { + int i; + lua_State *L; + global_State *g; + LG *l = cast(LG *, (*f)(ud, NULL, LUA_TTHREAD, sizeof(LG))); + if (l == NULL) return NULL; + L = &l->l.l; + g = &l->g; + L->next = NULL; + L->tt = LUA_TTHREAD; + g->currentwhite = bit2mask(WHITE0BIT, FIXEDBIT); + L->marked = luaC_white(g); + g->gckind = KGC_NORMAL; + preinit_state(L, g); + g->frealloc = f; + g->ud = ud; + g->mainthread = L; + g->seed = makeseed(L); + g->uvhead.u.l.prev = &g->uvhead; + g->uvhead.u.l.next = &g->uvhead; + g->gcrunning = 0; /* no GC while building state */ + g->GCestimate = 0; + g->strt.size = 0; + g->strt.nuse = 0; + g->strt.hash = NULL; + setnilvalue(&g->l_registry); + luaZ_initbuffer(L, &g->buff); + g->panic = NULL; + g->version = lua_version(NULL); + g->gcstate = GCSpause; + g->allgc = NULL; + g->finobj = NULL; + g->tobefnz = NULL; + g->sweepgc = g->sweepfin = NULL; + g->gray = g->grayagain = NULL; + g->weak = g->ephemeron = g->allweak = NULL; + g->totalbytes = sizeof(LG); + g->GCdebt = 0; + g->gcpause = LUAI_GCPAUSE; + g->gcmajorinc = LUAI_GCMAJOR; + g->gcstepmul = LUAI_GCMUL; + for (i=0; i < LUA_NUMTAGS; i++) g->mt[i] = NULL; + if (luaD_rawrunprotected(L, f_luaopen, NULL) != LUA_OK) { + /* memory allocation error: free partial state */ + close_state(L); + L = NULL; + } + else + luai_userstateopen(L); + return L; +} + + +LUA_API void lua_close (lua_State *L) { + L = G(L)->mainthread; /* only the main thread can be closed */ + lua_lock(L); + luai_userstateclose(L); + close_state(L); +} + + diff --git a/ext/lua/src/lstring.c b/ext/lua/src/lstring.c new file mode 100644 index 000000000..8b5af0b2e --- /dev/null +++ b/ext/lua/src/lstring.c @@ -0,0 +1,185 @@ +/* +** $Id: lstring.c,v 2.26 2013/01/08 13:50:10 roberto Exp $ +** String table (keeps all strings handled by Lua) +** See Copyright Notice in lua.h +*/ + + +#include + +#define lstring_c +#define LUA_CORE + +#include "lua.h" + +#include "lmem.h" +#include "lobject.h" +#include "lstate.h" +#include "lstring.h" + + +/* +** Lua will use at most ~(2^LUAI_HASHLIMIT) bytes from a string to +** compute its hash +*/ +#if !defined(LUAI_HASHLIMIT) +#define LUAI_HASHLIMIT 5 +#endif + + +/* +** equality for long strings +*/ +int luaS_eqlngstr (TString *a, TString *b) { + size_t len = a->tsv.len; + lua_assert(a->tsv.tt == LUA_TLNGSTR && b->tsv.tt == LUA_TLNGSTR); + return (a == b) || /* same instance or... */ + ((len == b->tsv.len) && /* equal length and ... */ + (memcmp(getstr(a), getstr(b), len) == 0)); /* equal contents */ +} + + +/* +** equality for strings +*/ +int luaS_eqstr (TString *a, TString *b) { + return (a->tsv.tt == b->tsv.tt) && + (a->tsv.tt == LUA_TSHRSTR ? eqshrstr(a, b) : luaS_eqlngstr(a, b)); +} + + +unsigned int luaS_hash (const char *str, size_t l, unsigned int seed) { + unsigned int h = seed ^ cast(unsigned int, l); + size_t l1; + size_t step = (l >> LUAI_HASHLIMIT) + 1; + for (l1 = l; l1 >= step; l1 -= step) + h = h ^ ((h<<5) + (h>>2) + cast_byte(str[l1 - 1])); + return h; +} + + +/* +** resizes the string table +*/ +void luaS_resize (lua_State *L, int newsize) { + int i; + stringtable *tb = &G(L)->strt; + /* cannot resize while GC is traversing strings */ + luaC_runtilstate(L, ~bitmask(GCSsweepstring)); + if (newsize > tb->size) { + luaM_reallocvector(L, tb->hash, tb->size, newsize, GCObject *); + for (i = tb->size; i < newsize; i++) tb->hash[i] = NULL; + } + /* rehash */ + for (i=0; isize; i++) { + GCObject *p = tb->hash[i]; + tb->hash[i] = NULL; + while (p) { /* for each node in the list */ + GCObject *next = gch(p)->next; /* save next */ + unsigned int h = lmod(gco2ts(p)->hash, newsize); /* new position */ + gch(p)->next = tb->hash[h]; /* chain it */ + tb->hash[h] = p; + resetoldbit(p); /* see MOVE OLD rule */ + p = next; + } + } + if (newsize < tb->size) { + /* shrinking slice must be empty */ + lua_assert(tb->hash[newsize] == NULL && tb->hash[tb->size - 1] == NULL); + luaM_reallocvector(L, tb->hash, tb->size, newsize, GCObject *); + } + tb->size = newsize; +} + + +/* +** creates a new string object +*/ +static TString *createstrobj (lua_State *L, const char *str, size_t l, + int tag, unsigned int h, GCObject **list) { + TString *ts; + size_t totalsize; /* total size of TString object */ + totalsize = sizeof(TString) + ((l + 1) * sizeof(char)); + ts = &luaC_newobj(L, tag, totalsize, list, 0)->ts; + ts->tsv.len = l; + ts->tsv.hash = h; + ts->tsv.extra = 0; + memcpy(ts+1, str, l*sizeof(char)); + ((char *)(ts+1))[l] = '\0'; /* ending 0 */ + return ts; +} + + +/* +** creates a new short string, inserting it into string table +*/ +static TString *newshrstr (lua_State *L, const char *str, size_t l, + unsigned int h) { + GCObject **list; /* (pointer to) list where it will be inserted */ + stringtable *tb = &G(L)->strt; + TString *s; + if (tb->nuse >= cast(lu_int32, tb->size) && tb->size <= MAX_INT/2) + luaS_resize(L, tb->size*2); /* too crowded */ + list = &tb->hash[lmod(h, tb->size)]; + s = createstrobj(L, str, l, LUA_TSHRSTR, h, list); + tb->nuse++; + return s; +} + + +/* +** checks whether short string exists and reuses it or creates a new one +*/ +static TString *internshrstr (lua_State *L, const char *str, size_t l) { + GCObject *o; + global_State *g = G(L); + unsigned int h = luaS_hash(str, l, g->seed); + for (o = g->strt.hash[lmod(h, g->strt.size)]; + o != NULL; + o = gch(o)->next) { + TString *ts = rawgco2ts(o); + if (h == ts->tsv.hash && + l == ts->tsv.len && + (memcmp(str, getstr(ts), l * sizeof(char)) == 0)) { + if (isdead(G(L), o)) /* string is dead (but was not collected yet)? */ + changewhite(o); /* resurrect it */ + return ts; + } + } + return newshrstr(L, str, l, h); /* not found; create a new string */ +} + + +/* +** new string (with explicit length) +*/ +TString *luaS_newlstr (lua_State *L, const char *str, size_t l) { + if (l <= LUAI_MAXSHORTLEN) /* short string? */ + return internshrstr(L, str, l); + else { + if (l + 1 > (MAX_SIZET - sizeof(TString))/sizeof(char)) + luaM_toobig(L); + return createstrobj(L, str, l, LUA_TLNGSTR, G(L)->seed, NULL); + } +} + + +/* +** new zero-terminated string +*/ +TString *luaS_new (lua_State *L, const char *str) { + return luaS_newlstr(L, str, strlen(str)); +} + + +Udata *luaS_newudata (lua_State *L, size_t s, Table *e) { + Udata *u; + if (s > MAX_SIZET - sizeof(Udata)) + luaM_toobig(L); + u = &luaC_newobj(L, LUA_TUSERDATA, sizeof(Udata) + s, NULL, 0)->u; + u->uv.len = s; + u->uv.metatable = NULL; + u->uv.env = e; + return u; +} + diff --git a/ext/lua/src/lstrlib.c b/ext/lua/src/lstrlib.c new file mode 100644 index 000000000..fcc61c9a6 --- /dev/null +++ b/ext/lua/src/lstrlib.c @@ -0,0 +1,1019 @@ +/* +** $Id: lstrlib.c,v 1.178 2012/08/14 18:12:34 roberto Exp $ +** Standard library for string operations and pattern-matching +** See Copyright Notice in lua.h +*/ + + +#include +#include +#include +#include +#include + +#define lstrlib_c +#define LUA_LIB + +#include "lua.h" + +#include "lauxlib.h" +#include "lualib.h" + + +/* +** maximum number of captures that a pattern can do during +** pattern-matching. This limit is arbitrary. +*/ +#if !defined(LUA_MAXCAPTURES) +#define LUA_MAXCAPTURES 32 +#endif + + +/* macro to `unsign' a character */ +#define uchar(c) ((unsigned char)(c)) + + + +static int str_len (lua_State *L) { + size_t l; + luaL_checklstring(L, 1, &l); + lua_pushinteger(L, (lua_Integer)l); + return 1; +} + + +/* translate a relative string position: negative means back from end */ +static size_t posrelat (ptrdiff_t pos, size_t len) { + if (pos >= 0) return (size_t)pos; + else if (0u - (size_t)pos > len) return 0; + else return len - ((size_t)-pos) + 1; +} + + +static int str_sub (lua_State *L) { + size_t l; + const char *s = luaL_checklstring(L, 1, &l); + size_t start = posrelat(luaL_checkinteger(L, 2), l); + size_t end = posrelat(luaL_optinteger(L, 3, -1), l); + if (start < 1) start = 1; + if (end > l) end = l; + if (start <= end) + lua_pushlstring(L, s + start - 1, end - start + 1); + else lua_pushliteral(L, ""); + return 1; +} + + +static int str_reverse (lua_State *L) { + size_t l, i; + luaL_Buffer b; + const char *s = luaL_checklstring(L, 1, &l); + char *p = luaL_buffinitsize(L, &b, l); + for (i = 0; i < l; i++) + p[i] = s[l - i - 1]; + luaL_pushresultsize(&b, l); + return 1; +} + + +static int str_lower (lua_State *L) { + size_t l; + size_t i; + luaL_Buffer b; + const char *s = luaL_checklstring(L, 1, &l); + char *p = luaL_buffinitsize(L, &b, l); + for (i=0; i> 1) + +static int str_rep (lua_State *L) { + size_t l, lsep; + const char *s = luaL_checklstring(L, 1, &l); + int n = luaL_checkint(L, 2); + const char *sep = luaL_optlstring(L, 3, "", &lsep); + if (n <= 0) lua_pushliteral(L, ""); + else if (l + lsep < l || l + lsep >= MAXSIZE / n) /* may overflow? */ + return luaL_error(L, "resulting string too large"); + else { + size_t totallen = n * l + (n - 1) * lsep; + luaL_Buffer b; + char *p = luaL_buffinitsize(L, &b, totallen); + while (n-- > 1) { /* first n-1 copies (followed by separator) */ + memcpy(p, s, l * sizeof(char)); p += l; + if (lsep > 0) { /* avoid empty 'memcpy' (may be expensive) */ + memcpy(p, sep, lsep * sizeof(char)); p += lsep; + } + } + memcpy(p, s, l * sizeof(char)); /* last copy (not followed by separator) */ + luaL_pushresultsize(&b, totallen); + } + return 1; +} + + +static int str_byte (lua_State *L) { + size_t l; + const char *s = luaL_checklstring(L, 1, &l); + size_t posi = posrelat(luaL_optinteger(L, 2, 1), l); + size_t pose = posrelat(luaL_optinteger(L, 3, posi), l); + int n, i; + if (posi < 1) posi = 1; + if (pose > l) pose = l; + if (posi > pose) return 0; /* empty interval; return no values */ + n = (int)(pose - posi + 1); + if (posi + n <= pose) /* (size_t -> int) overflow? */ + return luaL_error(L, "string slice too long"); + luaL_checkstack(L, n, "string slice too long"); + for (i=0; i= ms->level || ms->capture[l].len == CAP_UNFINISHED) + return luaL_error(ms->L, "invalid capture index %%%d", l + 1); + return l; +} + + +static int capture_to_close (MatchState *ms) { + int level = ms->level; + for (level--; level>=0; level--) + if (ms->capture[level].len == CAP_UNFINISHED) return level; + return luaL_error(ms->L, "invalid pattern capture"); +} + + +static const char *classend (MatchState *ms, const char *p) { + switch (*p++) { + case L_ESC: { + if (p == ms->p_end) + luaL_error(ms->L, "malformed pattern (ends with " LUA_QL("%%") ")"); + return p+1; + } + case '[': { + if (*p == '^') p++; + do { /* look for a `]' */ + if (p == ms->p_end) + luaL_error(ms->L, "malformed pattern (missing " LUA_QL("]") ")"); + if (*(p++) == L_ESC && p < ms->p_end) + p++; /* skip escapes (e.g. `%]') */ + } while (*p != ']'); + return p+1; + } + default: { + return p; + } + } +} + + +static int match_class (int c, int cl) { + int res; + switch (tolower(cl)) { + case 'a' : res = isalpha(c); break; + case 'c' : res = iscntrl(c); break; + case 'd' : res = isdigit(c); break; + case 'g' : res = isgraph(c); break; + case 'l' : res = islower(c); break; + case 'p' : res = ispunct(c); break; + case 's' : res = isspace(c); break; + case 'u' : res = isupper(c); break; + case 'w' : res = isalnum(c); break; + case 'x' : res = isxdigit(c); break; + case 'z' : res = (c == 0); break; /* deprecated option */ + default: return (cl == c); + } + return (islower(cl) ? res : !res); +} + + +static int matchbracketclass (int c, const char *p, const char *ec) { + int sig = 1; + if (*(p+1) == '^') { + sig = 0; + p++; /* skip the `^' */ + } + while (++p < ec) { + if (*p == L_ESC) { + p++; + if (match_class(c, uchar(*p))) + return sig; + } + else if ((*(p+1) == '-') && (p+2 < ec)) { + p+=2; + if (uchar(*(p-2)) <= c && c <= uchar(*p)) + return sig; + } + else if (uchar(*p) == c) return sig; + } + return !sig; +} + + +static int singlematch (MatchState *ms, const char *s, const char *p, + const char *ep) { + if (s >= ms->src_end) + return 0; + else { + int c = uchar(*s); + switch (*p) { + case '.': return 1; /* matches any char */ + case L_ESC: return match_class(c, uchar(*(p+1))); + case '[': return matchbracketclass(c, p, ep-1); + default: return (uchar(*p) == c); + } + } +} + + +static const char *matchbalance (MatchState *ms, const char *s, + const char *p) { + if (p >= ms->p_end - 1) + luaL_error(ms->L, "malformed pattern " + "(missing arguments to " LUA_QL("%%b") ")"); + if (*s != *p) return NULL; + else { + int b = *p; + int e = *(p+1); + int cont = 1; + while (++s < ms->src_end) { + if (*s == e) { + if (--cont == 0) return s+1; + } + else if (*s == b) cont++; + } + } + return NULL; /* string ends out of balance */ +} + + +static const char *max_expand (MatchState *ms, const char *s, + const char *p, const char *ep) { + ptrdiff_t i = 0; /* counts maximum expand for item */ + while (singlematch(ms, s + i, p, ep)) + i++; + /* keeps trying to match with the maximum repetitions */ + while (i>=0) { + const char *res = match(ms, (s+i), ep+1); + if (res) return res; + i--; /* else didn't match; reduce 1 repetition to try again */ + } + return NULL; +} + + +static const char *min_expand (MatchState *ms, const char *s, + const char *p, const char *ep) { + for (;;) { + const char *res = match(ms, s, ep+1); + if (res != NULL) + return res; + else if (singlematch(ms, s, p, ep)) + s++; /* try with one more repetition */ + else return NULL; + } +} + + +static const char *start_capture (MatchState *ms, const char *s, + const char *p, int what) { + const char *res; + int level = ms->level; + if (level >= LUA_MAXCAPTURES) luaL_error(ms->L, "too many captures"); + ms->capture[level].init = s; + ms->capture[level].len = what; + ms->level = level+1; + if ((res=match(ms, s, p)) == NULL) /* match failed? */ + ms->level--; /* undo capture */ + return res; +} + + +static const char *end_capture (MatchState *ms, const char *s, + const char *p) { + int l = capture_to_close(ms); + const char *res; + ms->capture[l].len = s - ms->capture[l].init; /* close capture */ + if ((res = match(ms, s, p)) == NULL) /* match failed? */ + ms->capture[l].len = CAP_UNFINISHED; /* undo capture */ + return res; +} + + +static const char *match_capture (MatchState *ms, const char *s, int l) { + size_t len; + l = check_capture(ms, l); + len = ms->capture[l].len; + if ((size_t)(ms->src_end-s) >= len && + memcmp(ms->capture[l].init, s, len) == 0) + return s+len; + else return NULL; +} + + +static const char *match (MatchState *ms, const char *s, const char *p) { + if (ms->matchdepth-- == 0) + luaL_error(ms->L, "pattern too complex"); + init: /* using goto's to optimize tail recursion */ + if (p != ms->p_end) { /* end of pattern? */ + switch (*p) { + case '(': { /* start capture */ + if (*(p + 1) == ')') /* position capture? */ + s = start_capture(ms, s, p + 2, CAP_POSITION); + else + s = start_capture(ms, s, p + 1, CAP_UNFINISHED); + break; + } + case ')': { /* end capture */ + s = end_capture(ms, s, p + 1); + break; + } + case '$': { + if ((p + 1) != ms->p_end) /* is the `$' the last char in pattern? */ + goto dflt; /* no; go to default */ + s = (s == ms->src_end) ? s : NULL; /* check end of string */ + break; + } + case L_ESC: { /* escaped sequences not in the format class[*+?-]? */ + switch (*(p + 1)) { + case 'b': { /* balanced string? */ + s = matchbalance(ms, s, p + 2); + if (s != NULL) { + p += 4; goto init; /* return match(ms, s, p + 4); */ + } /* else fail (s == NULL) */ + break; + } + case 'f': { /* frontier? */ + const char *ep; char previous; + p += 2; + if (*p != '[') + luaL_error(ms->L, "missing " LUA_QL("[") " after " + LUA_QL("%%f") " in pattern"); + ep = classend(ms, p); /* points to what is next */ + previous = (s == ms->src_init) ? '\0' : *(s - 1); + if (!matchbracketclass(uchar(previous), p, ep - 1) && + matchbracketclass(uchar(*s), p, ep - 1)) { + p = ep; goto init; /* return match(ms, s, ep); */ + } + s = NULL; /* match failed */ + break; + } + case '0': case '1': case '2': case '3': + case '4': case '5': case '6': case '7': + case '8': case '9': { /* capture results (%0-%9)? */ + s = match_capture(ms, s, uchar(*(p + 1))); + if (s != NULL) { + p += 2; goto init; /* return match(ms, s, p + 2) */ + } + break; + } + default: goto dflt; + } + break; + } + default: dflt: { /* pattern class plus optional suffix */ + const char *ep = classend(ms, p); /* points to optional suffix */ + /* does not match at least once? */ + if (!singlematch(ms, s, p, ep)) { + if (*ep == '*' || *ep == '?' || *ep == '-') { /* accept empty? */ + p = ep + 1; goto init; /* return match(ms, s, ep + 1); */ + } + else /* '+' or no suffix */ + s = NULL; /* fail */ + } + else { /* matched once */ + switch (*ep) { /* handle optional suffix */ + case '?': { /* optional */ + const char *res; + if ((res = match(ms, s + 1, ep + 1)) != NULL) + s = res; + else { + p = ep + 1; goto init; /* else return match(ms, s, ep + 1); */ + } + break; + } + case '+': /* 1 or more repetitions */ + s++; /* 1 match already done */ + /* go through */ + case '*': /* 0 or more repetitions */ + s = max_expand(ms, s, p, ep); + break; + case '-': /* 0 or more repetitions (minimum) */ + s = min_expand(ms, s, p, ep); + break; + default: /* no suffix */ + s++; p = ep; goto init; /* return match(ms, s + 1, ep); */ + } + } + break; + } + } + } + ms->matchdepth++; + return s; +} + + + +static const char *lmemfind (const char *s1, size_t l1, + const char *s2, size_t l2) { + if (l2 == 0) return s1; /* empty strings are everywhere */ + else if (l2 > l1) return NULL; /* avoids a negative `l1' */ + else { + const char *init; /* to search for a `*s2' inside `s1' */ + l2--; /* 1st char will be checked by `memchr' */ + l1 = l1-l2; /* `s2' cannot be found after that */ + while (l1 > 0 && (init = (const char *)memchr(s1, *s2, l1)) != NULL) { + init++; /* 1st char is already checked */ + if (memcmp(init, s2+1, l2) == 0) + return init-1; + else { /* correct `l1' and `s1' to try again */ + l1 -= init-s1; + s1 = init; + } + } + return NULL; /* not found */ + } +} + + +static void push_onecapture (MatchState *ms, int i, const char *s, + const char *e) { + if (i >= ms->level) { + if (i == 0) /* ms->level == 0, too */ + lua_pushlstring(ms->L, s, e - s); /* add whole match */ + else + luaL_error(ms->L, "invalid capture index"); + } + else { + ptrdiff_t l = ms->capture[i].len; + if (l == CAP_UNFINISHED) luaL_error(ms->L, "unfinished capture"); + if (l == CAP_POSITION) + lua_pushinteger(ms->L, ms->capture[i].init - ms->src_init + 1); + else + lua_pushlstring(ms->L, ms->capture[i].init, l); + } +} + + +static int push_captures (MatchState *ms, const char *s, const char *e) { + int i; + int nlevels = (ms->level == 0 && s) ? 1 : ms->level; + luaL_checkstack(ms->L, nlevels, "too many captures"); + for (i = 0; i < nlevels; i++) + push_onecapture(ms, i, s, e); + return nlevels; /* number of strings pushed */ +} + + +/* check whether pattern has no special characters */ +static int nospecials (const char *p, size_t l) { + size_t upto = 0; + do { + if (strpbrk(p + upto, SPECIALS)) + return 0; /* pattern has a special character */ + upto += strlen(p + upto) + 1; /* may have more after \0 */ + } while (upto <= l); + return 1; /* no special chars found */ +} + + +static int str_find_aux (lua_State *L, int find) { + size_t ls, lp; + const char *s = luaL_checklstring(L, 1, &ls); + const char *p = luaL_checklstring(L, 2, &lp); + size_t init = posrelat(luaL_optinteger(L, 3, 1), ls); + if (init < 1) init = 1; + else if (init > ls + 1) { /* start after string's end? */ + lua_pushnil(L); /* cannot find anything */ + return 1; + } + /* explicit request or no special characters? */ + if (find && (lua_toboolean(L, 4) || nospecials(p, lp))) { + /* do a plain search */ + const char *s2 = lmemfind(s + init - 1, ls - init + 1, p, lp); + if (s2) { + lua_pushinteger(L, s2 - s + 1); + lua_pushinteger(L, s2 - s + lp); + return 2; + } + } + else { + MatchState ms; + const char *s1 = s + init - 1; + int anchor = (*p == '^'); + if (anchor) { + p++; lp--; /* skip anchor character */ + } + ms.L = L; + ms.matchdepth = MAXCCALLS; + ms.src_init = s; + ms.src_end = s + ls; + ms.p_end = p + lp; + do { + const char *res; + ms.level = 0; + lua_assert(ms.matchdepth == MAXCCALLS); + if ((res=match(&ms, s1, p)) != NULL) { + if (find) { + lua_pushinteger(L, s1 - s + 1); /* start */ + lua_pushinteger(L, res - s); /* end */ + return push_captures(&ms, NULL, 0) + 2; + } + else + return push_captures(&ms, s1, res); + } + } while (s1++ < ms.src_end && !anchor); + } + lua_pushnil(L); /* not found */ + return 1; +} + + +static int str_find (lua_State *L) { + return str_find_aux(L, 1); +} + + +static int str_match (lua_State *L) { + return str_find_aux(L, 0); +} + + +static int gmatch_aux (lua_State *L) { + MatchState ms; + size_t ls, lp; + const char *s = lua_tolstring(L, lua_upvalueindex(1), &ls); + const char *p = lua_tolstring(L, lua_upvalueindex(2), &lp); + const char *src; + ms.L = L; + ms.matchdepth = MAXCCALLS; + ms.src_init = s; + ms.src_end = s+ls; + ms.p_end = p + lp; + for (src = s + (size_t)lua_tointeger(L, lua_upvalueindex(3)); + src <= ms.src_end; + src++) { + const char *e; + ms.level = 0; + lua_assert(ms.matchdepth == MAXCCALLS); + if ((e = match(&ms, src, p)) != NULL) { + lua_Integer newstart = e-s; + if (e == src) newstart++; /* empty match? go at least one position */ + lua_pushinteger(L, newstart); + lua_replace(L, lua_upvalueindex(3)); + return push_captures(&ms, src, e); + } + } + return 0; /* not found */ +} + + +static int gmatch (lua_State *L) { + luaL_checkstring(L, 1); + luaL_checkstring(L, 2); + lua_settop(L, 2); + lua_pushinteger(L, 0); + lua_pushcclosure(L, gmatch_aux, 3); + return 1; +} + + +static void add_s (MatchState *ms, luaL_Buffer *b, const char *s, + const char *e) { + size_t l, i; + const char *news = lua_tolstring(ms->L, 3, &l); + for (i = 0; i < l; i++) { + if (news[i] != L_ESC) + luaL_addchar(b, news[i]); + else { + i++; /* skip ESC */ + if (!isdigit(uchar(news[i]))) { + if (news[i] != L_ESC) + luaL_error(ms->L, "invalid use of " LUA_QL("%c") + " in replacement string", L_ESC); + luaL_addchar(b, news[i]); + } + else if (news[i] == '0') + luaL_addlstring(b, s, e - s); + else { + push_onecapture(ms, news[i] - '1', s, e); + luaL_addvalue(b); /* add capture to accumulated result */ + } + } + } +} + + +static void add_value (MatchState *ms, luaL_Buffer *b, const char *s, + const char *e, int tr) { + lua_State *L = ms->L; + switch (tr) { + case LUA_TFUNCTION: { + int n; + lua_pushvalue(L, 3); + n = push_captures(ms, s, e); + lua_call(L, n, 1); + break; + } + case LUA_TTABLE: { + push_onecapture(ms, 0, s, e); + lua_gettable(L, 3); + break; + } + default: { /* LUA_TNUMBER or LUA_TSTRING */ + add_s(ms, b, s, e); + return; + } + } + if (!lua_toboolean(L, -1)) { /* nil or false? */ + lua_pop(L, 1); + lua_pushlstring(L, s, e - s); /* keep original text */ + } + else if (!lua_isstring(L, -1)) + luaL_error(L, "invalid replacement value (a %s)", luaL_typename(L, -1)); + luaL_addvalue(b); /* add result to accumulator */ +} + + +static int str_gsub (lua_State *L) { + size_t srcl, lp; + const char *src = luaL_checklstring(L, 1, &srcl); + const char *p = luaL_checklstring(L, 2, &lp); + int tr = lua_type(L, 3); + size_t max_s = luaL_optinteger(L, 4, srcl+1); + int anchor = (*p == '^'); + size_t n = 0; + MatchState ms; + luaL_Buffer b; + luaL_argcheck(L, tr == LUA_TNUMBER || tr == LUA_TSTRING || + tr == LUA_TFUNCTION || tr == LUA_TTABLE, 3, + "string/function/table expected"); + luaL_buffinit(L, &b); + if (anchor) { + p++; lp--; /* skip anchor character */ + } + ms.L = L; + ms.matchdepth = MAXCCALLS; + ms.src_init = src; + ms.src_end = src+srcl; + ms.p_end = p + lp; + while (n < max_s) { + const char *e; + ms.level = 0; + lua_assert(ms.matchdepth == MAXCCALLS); + e = match(&ms, src, p); + if (e) { + n++; + add_value(&ms, &b, src, e, tr); + } + if (e && e>src) /* non empty match? */ + src = e; /* skip it */ + else if (src < ms.src_end) + luaL_addchar(&b, *src++); + else break; + if (anchor) break; + } + luaL_addlstring(&b, src, ms.src_end-src); + luaL_pushresult(&b); + lua_pushinteger(L, n); /* number of substitutions */ + return 2; +} + +/* }====================================================== */ + + + +/* +** {====================================================== +** STRING FORMAT +** ======================================================= +*/ + +/* +** LUA_INTFRMLEN is the length modifier for integer conversions in +** 'string.format'; LUA_INTFRM_T is the integer type corresponding to +** the previous length +*/ +#if !defined(LUA_INTFRMLEN) /* { */ +#if defined(LUA_USE_LONGLONG) + +#define LUA_INTFRMLEN "ll" +#define LUA_INTFRM_T long long + +#else + +#define LUA_INTFRMLEN "l" +#define LUA_INTFRM_T long + +#endif +#endif /* } */ + + +/* +** LUA_FLTFRMLEN is the length modifier for float conversions in +** 'string.format'; LUA_FLTFRM_T is the float type corresponding to +** the previous length +*/ +#if !defined(LUA_FLTFRMLEN) + +#define LUA_FLTFRMLEN "" +#define LUA_FLTFRM_T double + +#endif + + +/* maximum size of each formatted item (> len(format('%99.99f', -1e308))) */ +#define MAX_ITEM 512 +/* valid flags in a format specification */ +#define FLAGS "-+ #0" +/* +** maximum size of each format specification (such as '%-099.99d') +** (+10 accounts for %99.99x plus margin of error) +*/ +#define MAX_FORMAT (sizeof(FLAGS) + sizeof(LUA_INTFRMLEN) + 10) + + +static void addquoted (lua_State *L, luaL_Buffer *b, int arg) { + size_t l; + const char *s = luaL_checklstring(L, arg, &l); + luaL_addchar(b, '"'); + while (l--) { + if (*s == '"' || *s == '\\' || *s == '\n') { + luaL_addchar(b, '\\'); + luaL_addchar(b, *s); + } + else if (*s == '\0' || iscntrl(uchar(*s))) { + char buff[10]; + if (!isdigit(uchar(*(s+1)))) + sprintf(buff, "\\%d", (int)uchar(*s)); + else + sprintf(buff, "\\%03d", (int)uchar(*s)); + luaL_addstring(b, buff); + } + else + luaL_addchar(b, *s); + s++; + } + luaL_addchar(b, '"'); +} + +static const char *scanformat (lua_State *L, const char *strfrmt, char *form) { + const char *p = strfrmt; + while (*p != '\0' && strchr(FLAGS, *p) != NULL) p++; /* skip flags */ + if ((size_t)(p - strfrmt) >= sizeof(FLAGS)/sizeof(char)) + luaL_error(L, "invalid format (repeated flags)"); + if (isdigit(uchar(*p))) p++; /* skip width */ + if (isdigit(uchar(*p))) p++; /* (2 digits at most) */ + if (*p == '.') { + p++; + if (isdigit(uchar(*p))) p++; /* skip precision */ + if (isdigit(uchar(*p))) p++; /* (2 digits at most) */ + } + if (isdigit(uchar(*p))) + luaL_error(L, "invalid format (width or precision too long)"); + *(form++) = '%'; + memcpy(form, strfrmt, (p - strfrmt + 1) * sizeof(char)); + form += p - strfrmt + 1; + *form = '\0'; + return p; +} + + +/* +** add length modifier into formats +*/ +static void addlenmod (char *form, const char *lenmod) { + size_t l = strlen(form); + size_t lm = strlen(lenmod); + char spec = form[l - 1]; + strcpy(form + l - 1, lenmod); + form[l + lm - 1] = spec; + form[l + lm] = '\0'; +} + + +static int str_format (lua_State *L) { + int top = lua_gettop(L); + int arg = 1; + size_t sfl; + const char *strfrmt = luaL_checklstring(L, arg, &sfl); + const char *strfrmt_end = strfrmt+sfl; + luaL_Buffer b; + luaL_buffinit(L, &b); + while (strfrmt < strfrmt_end) { + if (*strfrmt != L_ESC) + luaL_addchar(&b, *strfrmt++); + else if (*++strfrmt == L_ESC) + luaL_addchar(&b, *strfrmt++); /* %% */ + else { /* format item */ + char form[MAX_FORMAT]; /* to store the format (`%...') */ + char *buff = luaL_prepbuffsize(&b, MAX_ITEM); /* to put formatted item */ + int nb = 0; /* number of bytes in added item */ + if (++arg > top) + luaL_argerror(L, arg, "no value"); + strfrmt = scanformat(L, strfrmt, form); + switch (*strfrmt++) { + case 'c': { + nb = sprintf(buff, form, luaL_checkint(L, arg)); + break; + } + case 'd': case 'i': { + lua_Number n = luaL_checknumber(L, arg); + LUA_INTFRM_T ni = (LUA_INTFRM_T)n; + lua_Number diff = n - (lua_Number)ni; + luaL_argcheck(L, -1 < diff && diff < 1, arg, + "not a number in proper range"); + addlenmod(form, LUA_INTFRMLEN); + nb = sprintf(buff, form, ni); + break; + } + case 'o': case 'u': case 'x': case 'X': { + lua_Number n = luaL_checknumber(L, arg); + unsigned LUA_INTFRM_T ni = (unsigned LUA_INTFRM_T)n; + lua_Number diff = n - (lua_Number)ni; + luaL_argcheck(L, -1 < diff && diff < 1, arg, + "not a non-negative number in proper range"); + addlenmod(form, LUA_INTFRMLEN); + nb = sprintf(buff, form, ni); + break; + } + case 'e': case 'E': case 'f': +#if defined(LUA_USE_AFORMAT) + case 'a': case 'A': +#endif + case 'g': case 'G': { + addlenmod(form, LUA_FLTFRMLEN); + nb = sprintf(buff, form, (LUA_FLTFRM_T)luaL_checknumber(L, arg)); + break; + } + case 'q': { + addquoted(L, &b, arg); + break; + } + case 's': { + size_t l; + const char *s = luaL_tolstring(L, arg, &l); + if (!strchr(form, '.') && l >= 100) { + /* no precision and string is too long to be formatted; + keep original string */ + luaL_addvalue(&b); + break; + } + else { + nb = sprintf(buff, form, s); + lua_pop(L, 1); /* remove result from 'luaL_tolstring' */ + break; + } + } + default: { /* also treat cases `pnLlh' */ + return luaL_error(L, "invalid option " LUA_QL("%%%c") " to " + LUA_QL("format"), *(strfrmt - 1)); + } + } + luaL_addsize(&b, nb); + } + } + luaL_pushresult(&b); + return 1; +} + +/* }====================================================== */ + + +static const luaL_Reg strlib[] = { + {"byte", str_byte}, + {"char", str_char}, + {"dump", str_dump}, + {"find", str_find}, + {"format", str_format}, + {"gmatch", gmatch}, + {"gsub", str_gsub}, + {"len", str_len}, + {"lower", str_lower}, + {"match", str_match}, + {"rep", str_rep}, + {"reverse", str_reverse}, + {"sub", str_sub}, + {"upper", str_upper}, + {NULL, NULL} +}; + + +static void createmetatable (lua_State *L) { + lua_createtable(L, 0, 1); /* table to be metatable for strings */ + lua_pushliteral(L, ""); /* dummy string */ + lua_pushvalue(L, -2); /* copy table */ + lua_setmetatable(L, -2); /* set table as metatable for strings */ + lua_pop(L, 1); /* pop dummy string */ + lua_pushvalue(L, -2); /* get string library */ + lua_setfield(L, -2, "__index"); /* metatable.__index = string */ + lua_pop(L, 1); /* pop metatable */ +} + + +/* +** Open string library +*/ +LUAMOD_API int luaopen_string (lua_State *L) { + luaL_newlib(L, strlib); + createmetatable(L); + return 1; +} + diff --git a/ext/lua/src/ltable.c b/ext/lua/src/ltable.c new file mode 100644 index 000000000..420391fc7 --- /dev/null +++ b/ext/lua/src/ltable.c @@ -0,0 +1,588 @@ +/* +** $Id: ltable.c,v 2.72 2012/09/11 19:37:16 roberto Exp $ +** Lua tables (hash) +** See Copyright Notice in lua.h +*/ + + +/* +** Implementation of tables (aka arrays, objects, or hash tables). +** Tables keep its elements in two parts: an array part and a hash part. +** Non-negative integer keys are all candidates to be kept in the array +** part. The actual size of the array is the largest `n' such that at +** least half the slots between 0 and n are in use. +** Hash uses a mix of chained scatter table with Brent's variation. +** A main invariant of these tables is that, if an element is not +** in its main position (i.e. the `original' position that its hash gives +** to it), then the colliding element is in its own main position. +** Hence even when the load factor reaches 100%, performance remains good. +*/ + +#include + +#define ltable_c +#define LUA_CORE + +#include "lua.h" + +#include "ldebug.h" +#include "ldo.h" +#include "lgc.h" +#include "lmem.h" +#include "lobject.h" +#include "lstate.h" +#include "lstring.h" +#include "ltable.h" +#include "lvm.h" + + +/* +** max size of array part is 2^MAXBITS +*/ +#if LUAI_BITSINT >= 32 +#define MAXBITS 30 +#else +#define MAXBITS (LUAI_BITSINT-2) +#endif + +#define MAXASIZE (1 << MAXBITS) + + +#define hashpow2(t,n) (gnode(t, lmod((n), sizenode(t)))) + +#define hashstr(t,str) hashpow2(t, (str)->tsv.hash) +#define hashboolean(t,p) hashpow2(t, p) + + +/* +** for some types, it is better to avoid modulus by power of 2, as +** they tend to have many 2 factors. +*/ +#define hashmod(t,n) (gnode(t, ((n) % ((sizenode(t)-1)|1)))) + + +#define hashpointer(t,p) hashmod(t, IntPoint(p)) + + +#define dummynode (&dummynode_) + +#define isdummy(n) ((n) == dummynode) + +static const Node dummynode_ = { + {NILCONSTANT}, /* value */ + {{NILCONSTANT, NULL}} /* key */ +}; + + +/* +** hash for lua_Numbers +*/ +static Node *hashnum (const Table *t, lua_Number n) { + int i; + luai_hashnum(i, n); + if (i < 0) { + if (cast(unsigned int, i) == 0u - i) /* use unsigned to avoid overflows */ + i = 0; /* handle INT_MIN */ + i = -i; /* must be a positive value */ + } + return hashmod(t, i); +} + + + +/* +** returns the `main' position of an element in a table (that is, the index +** of its hash value) +*/ +static Node *mainposition (const Table *t, const TValue *key) { + switch (ttype(key)) { + case LUA_TNUMBER: + return hashnum(t, nvalue(key)); + case LUA_TLNGSTR: { + TString *s = rawtsvalue(key); + if (s->tsv.extra == 0) { /* no hash? */ + s->tsv.hash = luaS_hash(getstr(s), s->tsv.len, s->tsv.hash); + s->tsv.extra = 1; /* now it has its hash */ + } + return hashstr(t, rawtsvalue(key)); + } + case LUA_TSHRSTR: + return hashstr(t, rawtsvalue(key)); + case LUA_TBOOLEAN: + return hashboolean(t, bvalue(key)); + case LUA_TLIGHTUSERDATA: + return hashpointer(t, pvalue(key)); + case LUA_TLCF: + return hashpointer(t, fvalue(key)); + default: + return hashpointer(t, gcvalue(key)); + } +} + + +/* +** returns the index for `key' if `key' is an appropriate key to live in +** the array part of the table, -1 otherwise. +*/ +static int arrayindex (const TValue *key) { + if (ttisnumber(key)) { + lua_Number n = nvalue(key); + int k; + lua_number2int(k, n); + if (luai_numeq(cast_num(k), n)) + return k; + } + return -1; /* `key' did not match some condition */ +} + + +/* +** returns the index of a `key' for table traversals. First goes all +** elements in the array part, then elements in the hash part. The +** beginning of a traversal is signaled by -1. +*/ +static int findindex (lua_State *L, Table *t, StkId key) { + int i; + if (ttisnil(key)) return -1; /* first iteration */ + i = arrayindex(key); + if (0 < i && i <= t->sizearray) /* is `key' inside array part? */ + return i-1; /* yes; that's the index (corrected to C) */ + else { + Node *n = mainposition(t, key); + for (;;) { /* check whether `key' is somewhere in the chain */ + /* key may be dead already, but it is ok to use it in `next' */ + if (luaV_rawequalobj(gkey(n), key) || + (ttisdeadkey(gkey(n)) && iscollectable(key) && + deadvalue(gkey(n)) == gcvalue(key))) { + i = cast_int(n - gnode(t, 0)); /* key index in hash table */ + /* hash elements are numbered after array ones */ + return i + t->sizearray; + } + else n = gnext(n); + if (n == NULL) + luaG_runerror(L, "invalid key to " LUA_QL("next")); /* key not found */ + } + } +} + + +int luaH_next (lua_State *L, Table *t, StkId key) { + int i = findindex(L, t, key); /* find original element */ + for (i++; i < t->sizearray; i++) { /* try first array part */ + if (!ttisnil(&t->array[i])) { /* a non-nil value? */ + setnvalue(key, cast_num(i+1)); + setobj2s(L, key+1, &t->array[i]); + return 1; + } + } + for (i -= t->sizearray; i < sizenode(t); i++) { /* then hash part */ + if (!ttisnil(gval(gnode(t, i)))) { /* a non-nil value? */ + setobj2s(L, key, gkey(gnode(t, i))); + setobj2s(L, key+1, gval(gnode(t, i))); + return 1; + } + } + return 0; /* no more elements */ +} + + +/* +** {============================================================= +** Rehash +** ============================================================== +*/ + + +static int computesizes (int nums[], int *narray) { + int i; + int twotoi; /* 2^i */ + int a = 0; /* number of elements smaller than 2^i */ + int na = 0; /* number of elements to go to array part */ + int n = 0; /* optimal size for array part */ + for (i = 0, twotoi = 1; twotoi/2 < *narray; i++, twotoi *= 2) { + if (nums[i] > 0) { + a += nums[i]; + if (a > twotoi/2) { /* more than half elements present? */ + n = twotoi; /* optimal size (till now) */ + na = a; /* all elements smaller than n will go to array part */ + } + } + if (a == *narray) break; /* all elements already counted */ + } + *narray = n; + lua_assert(*narray/2 <= na && na <= *narray); + return na; +} + + +static int countint (const TValue *key, int *nums) { + int k = arrayindex(key); + if (0 < k && k <= MAXASIZE) { /* is `key' an appropriate array index? */ + nums[luaO_ceillog2(k)]++; /* count as such */ + return 1; + } + else + return 0; +} + + +static int numusearray (const Table *t, int *nums) { + int lg; + int ttlg; /* 2^lg */ + int ause = 0; /* summation of `nums' */ + int i = 1; /* count to traverse all array keys */ + for (lg=0, ttlg=1; lg<=MAXBITS; lg++, ttlg*=2) { /* for each slice */ + int lc = 0; /* counter */ + int lim = ttlg; + if (lim > t->sizearray) { + lim = t->sizearray; /* adjust upper limit */ + if (i > lim) + break; /* no more elements to count */ + } + /* count elements in range (2^(lg-1), 2^lg] */ + for (; i <= lim; i++) { + if (!ttisnil(&t->array[i-1])) + lc++; + } + nums[lg] += lc; + ause += lc; + } + return ause; +} + + +static int numusehash (const Table *t, int *nums, int *pnasize) { + int totaluse = 0; /* total number of elements */ + int ause = 0; /* summation of `nums' */ + int i = sizenode(t); + while (i--) { + Node *n = &t->node[i]; + if (!ttisnil(gval(n))) { + ause += countint(gkey(n), nums); + totaluse++; + } + } + *pnasize += ause; + return totaluse; +} + + +static void setarrayvector (lua_State *L, Table *t, int size) { + int i; + luaM_reallocvector(L, t->array, t->sizearray, size, TValue); + for (i=t->sizearray; iarray[i]); + t->sizearray = size; +} + + +static void setnodevector (lua_State *L, Table *t, int size) { + int lsize; + if (size == 0) { /* no elements to hash part? */ + t->node = cast(Node *, dummynode); /* use common `dummynode' */ + lsize = 0; + } + else { + int i; + lsize = luaO_ceillog2(size); + if (lsize > MAXBITS) + luaG_runerror(L, "table overflow"); + size = twoto(lsize); + t->node = luaM_newvector(L, size, Node); + for (i=0; ilsizenode = cast_byte(lsize); + t->lastfree = gnode(t, size); /* all positions are free */ +} + + +void luaH_resize (lua_State *L, Table *t, int nasize, int nhsize) { + int i; + int oldasize = t->sizearray; + int oldhsize = t->lsizenode; + Node *nold = t->node; /* save old hash ... */ + if (nasize > oldasize) /* array part must grow? */ + setarrayvector(L, t, nasize); + /* create new hash part with appropriate size */ + setnodevector(L, t, nhsize); + if (nasize < oldasize) { /* array part must shrink? */ + t->sizearray = nasize; + /* re-insert elements from vanishing slice */ + for (i=nasize; iarray[i])) + luaH_setint(L, t, i + 1, &t->array[i]); + } + /* shrink array */ + luaM_reallocvector(L, t->array, oldasize, nasize, TValue); + } + /* re-insert elements from hash part */ + for (i = twoto(oldhsize) - 1; i >= 0; i--) { + Node *old = nold+i; + if (!ttisnil(gval(old))) { + /* doesn't need barrier/invalidate cache, as entry was + already present in the table */ + setobjt2t(L, luaH_set(L, t, gkey(old)), gval(old)); + } + } + if (!isdummy(nold)) + luaM_freearray(L, nold, cast(size_t, twoto(oldhsize))); /* free old array */ +} + + +void luaH_resizearray (lua_State *L, Table *t, int nasize) { + int nsize = isdummy(t->node) ? 0 : sizenode(t); + luaH_resize(L, t, nasize, nsize); +} + + +static void rehash (lua_State *L, Table *t, const TValue *ek) { + int nasize, na; + int nums[MAXBITS+1]; /* nums[i] = number of keys with 2^(i-1) < k <= 2^i */ + int i; + int totaluse; + for (i=0; i<=MAXBITS; i++) nums[i] = 0; /* reset counts */ + nasize = numusearray(t, nums); /* count keys in array part */ + totaluse = nasize; /* all those keys are integer keys */ + totaluse += numusehash(t, nums, &nasize); /* count keys in hash part */ + /* count extra key */ + nasize += countint(ek, nums); + totaluse++; + /* compute new size for array part */ + na = computesizes(nums, &nasize); + /* resize the table to new computed sizes */ + luaH_resize(L, t, nasize, totaluse - na); +} + + + +/* +** }============================================================= +*/ + + +Table *luaH_new (lua_State *L) { + Table *t = &luaC_newobj(L, LUA_TTABLE, sizeof(Table), NULL, 0)->h; + t->metatable = NULL; + t->flags = cast_byte(~0); + t->array = NULL; + t->sizearray = 0; + setnodevector(L, t, 0); + return t; +} + + +void luaH_free (lua_State *L, Table *t) { + if (!isdummy(t->node)) + luaM_freearray(L, t->node, cast(size_t, sizenode(t))); + luaM_freearray(L, t->array, t->sizearray); + luaM_free(L, t); +} + + +static Node *getfreepos (Table *t) { + while (t->lastfree > t->node) { + t->lastfree--; + if (ttisnil(gkey(t->lastfree))) + return t->lastfree; + } + return NULL; /* could not find a free place */ +} + + + +/* +** inserts a new key into a hash table; first, check whether key's main +** position is free. If not, check whether colliding node is in its main +** position or not: if it is not, move colliding node to an empty place and +** put new key in its main position; otherwise (colliding node is in its main +** position), new key goes to an empty position. +*/ +TValue *luaH_newkey (lua_State *L, Table *t, const TValue *key) { + Node *mp; + if (ttisnil(key)) luaG_runerror(L, "table index is nil"); + else if (ttisnumber(key) && luai_numisnan(L, nvalue(key))) + luaG_runerror(L, "table index is NaN"); + mp = mainposition(t, key); + if (!ttisnil(gval(mp)) || isdummy(mp)) { /* main position is taken? */ + Node *othern; + Node *n = getfreepos(t); /* get a free place */ + if (n == NULL) { /* cannot find a free place? */ + rehash(L, t, key); /* grow table */ + /* whatever called 'newkey' take care of TM cache and GC barrier */ + return luaH_set(L, t, key); /* insert key into grown table */ + } + lua_assert(!isdummy(n)); + othern = mainposition(t, gkey(mp)); + if (othern != mp) { /* is colliding node out of its main position? */ + /* yes; move colliding node into free position */ + while (gnext(othern) != mp) othern = gnext(othern); /* find previous */ + gnext(othern) = n; /* redo the chain with `n' in place of `mp' */ + *n = *mp; /* copy colliding node into free pos. (mp->next also goes) */ + gnext(mp) = NULL; /* now `mp' is free */ + setnilvalue(gval(mp)); + } + else { /* colliding node is in its own main position */ + /* new node will go into free position */ + gnext(n) = gnext(mp); /* chain new position */ + gnext(mp) = n; + mp = n; + } + } + setobj2t(L, gkey(mp), key); + luaC_barrierback(L, obj2gco(t), key); + lua_assert(ttisnil(gval(mp))); + return gval(mp); +} + + +/* +** search function for integers +*/ +const TValue *luaH_getint (Table *t, int key) { + /* (1 <= key && key <= t->sizearray) */ + if (cast(unsigned int, key-1) < cast(unsigned int, t->sizearray)) + return &t->array[key-1]; + else { + lua_Number nk = cast_num(key); + Node *n = hashnum(t, nk); + do { /* check whether `key' is somewhere in the chain */ + if (ttisnumber(gkey(n)) && luai_numeq(nvalue(gkey(n)), nk)) + return gval(n); /* that's it */ + else n = gnext(n); + } while (n); + return luaO_nilobject; + } +} + + +/* +** search function for short strings +*/ +const TValue *luaH_getstr (Table *t, TString *key) { + Node *n = hashstr(t, key); + lua_assert(key->tsv.tt == LUA_TSHRSTR); + do { /* check whether `key' is somewhere in the chain */ + if (ttisshrstring(gkey(n)) && eqshrstr(rawtsvalue(gkey(n)), key)) + return gval(n); /* that's it */ + else n = gnext(n); + } while (n); + return luaO_nilobject; +} + + +/* +** main search function +*/ +const TValue *luaH_get (Table *t, const TValue *key) { + switch (ttype(key)) { + case LUA_TSHRSTR: return luaH_getstr(t, rawtsvalue(key)); + case LUA_TNIL: return luaO_nilobject; + case LUA_TNUMBER: { + int k; + lua_Number n = nvalue(key); + lua_number2int(k, n); + if (luai_numeq(cast_num(k), n)) /* index is int? */ + return luaH_getint(t, k); /* use specialized version */ + /* else go through */ + } + default: { + Node *n = mainposition(t, key); + do { /* check whether `key' is somewhere in the chain */ + if (luaV_rawequalobj(gkey(n), key)) + return gval(n); /* that's it */ + else n = gnext(n); + } while (n); + return luaO_nilobject; + } + } +} + + +/* +** beware: when using this function you probably need to check a GC +** barrier and invalidate the TM cache. +*/ +TValue *luaH_set (lua_State *L, Table *t, const TValue *key) { + const TValue *p = luaH_get(t, key); + if (p != luaO_nilobject) + return cast(TValue *, p); + else return luaH_newkey(L, t, key); +} + + +void luaH_setint (lua_State *L, Table *t, int key, TValue *value) { + const TValue *p = luaH_getint(t, key); + TValue *cell; + if (p != luaO_nilobject) + cell = cast(TValue *, p); + else { + TValue k; + setnvalue(&k, cast_num(key)); + cell = luaH_newkey(L, t, &k); + } + setobj2t(L, cell, value); +} + + +static int unbound_search (Table *t, unsigned int j) { + unsigned int i = j; /* i is zero or a present index */ + j++; + /* find `i' and `j' such that i is present and j is not */ + while (!ttisnil(luaH_getint(t, j))) { + i = j; + j *= 2; + if (j > cast(unsigned int, MAX_INT)) { /* overflow? */ + /* table was built with bad purposes: resort to linear search */ + i = 1; + while (!ttisnil(luaH_getint(t, i))) i++; + return i - 1; + } + } + /* now do a binary search between them */ + while (j - i > 1) { + unsigned int m = (i+j)/2; + if (ttisnil(luaH_getint(t, m))) j = m; + else i = m; + } + return i; +} + + +/* +** Try to find a boundary in table `t'. A `boundary' is an integer index +** such that t[i] is non-nil and t[i+1] is nil (and 0 if t[1] is nil). +*/ +int luaH_getn (Table *t) { + unsigned int j = t->sizearray; + if (j > 0 && ttisnil(&t->array[j - 1])) { + /* there is a boundary in the array part: (binary) search for it */ + unsigned int i = 0; + while (j - i > 1) { + unsigned int m = (i+j)/2; + if (ttisnil(&t->array[m - 1])) j = m; + else i = m; + } + return i; + } + /* else must find a boundary in hash part */ + else if (isdummy(t->node)) /* hash part is empty? */ + return j; /* that is easy... */ + else return unbound_search(t, j); +} + + + +#if defined(LUA_DEBUG) + +Node *luaH_mainposition (const Table *t, const TValue *key) { + return mainposition(t, key); +} + +int luaH_isdummy (Node *n) { return isdummy(n); } + +#endif diff --git a/ext/lua/src/ltablib.c b/ext/lua/src/ltablib.c new file mode 100644 index 000000000..ad798b4e2 --- /dev/null +++ b/ext/lua/src/ltablib.c @@ -0,0 +1,283 @@ +/* +** $Id: ltablib.c,v 1.65 2013/03/07 18:17:24 roberto Exp $ +** Library for Table Manipulation +** See Copyright Notice in lua.h +*/ + + +#include + +#define ltablib_c +#define LUA_LIB + +#include "lua.h" + +#include "lauxlib.h" +#include "lualib.h" + + +#define aux_getn(L,n) (luaL_checktype(L, n, LUA_TTABLE), luaL_len(L, n)) + + + +#if defined(LUA_COMPAT_MAXN) +static int maxn (lua_State *L) { + lua_Number max = 0; + luaL_checktype(L, 1, LUA_TTABLE); + lua_pushnil(L); /* first key */ + while (lua_next(L, 1)) { + lua_pop(L, 1); /* remove value */ + if (lua_type(L, -1) == LUA_TNUMBER) { + lua_Number v = lua_tonumber(L, -1); + if (v > max) max = v; + } + } + lua_pushnumber(L, max); + return 1; +} +#endif + + +static int tinsert (lua_State *L) { + int e = aux_getn(L, 1) + 1; /* first empty element */ + int pos; /* where to insert new element */ + switch (lua_gettop(L)) { + case 2: { /* called with only 2 arguments */ + pos = e; /* insert new element at the end */ + break; + } + case 3: { + int i; + pos = luaL_checkint(L, 2); /* 2nd argument is the position */ + luaL_argcheck(L, 1 <= pos && pos <= e, 2, "position out of bounds"); + for (i = e; i > pos; i--) { /* move up elements */ + lua_rawgeti(L, 1, i-1); + lua_rawseti(L, 1, i); /* t[i] = t[i-1] */ + } + break; + } + default: { + return luaL_error(L, "wrong number of arguments to " LUA_QL("insert")); + } + } + lua_rawseti(L, 1, pos); /* t[pos] = v */ + return 0; +} + + +static int tremove (lua_State *L) { + int size = aux_getn(L, 1); + int pos = luaL_optint(L, 2, size); + if (pos != size) /* validate 'pos' if given */ + luaL_argcheck(L, 1 <= pos && pos <= size + 1, 1, "position out of bounds"); + lua_rawgeti(L, 1, pos); /* result = t[pos] */ + for ( ; pos < size; pos++) { + lua_rawgeti(L, 1, pos+1); + lua_rawseti(L, 1, pos); /* t[pos] = t[pos+1] */ + } + lua_pushnil(L); + lua_rawseti(L, 1, pos); /* t[pos] = nil */ + return 1; +} + + +static void addfield (lua_State *L, luaL_Buffer *b, int i) { + lua_rawgeti(L, 1, i); + if (!lua_isstring(L, -1)) + luaL_error(L, "invalid value (%s) at index %d in table for " + LUA_QL("concat"), luaL_typename(L, -1), i); + luaL_addvalue(b); +} + + +static int tconcat (lua_State *L) { + luaL_Buffer b; + size_t lsep; + int i, last; + const char *sep = luaL_optlstring(L, 2, "", &lsep); + luaL_checktype(L, 1, LUA_TTABLE); + i = luaL_optint(L, 3, 1); + last = luaL_opt(L, luaL_checkint, 4, luaL_len(L, 1)); + luaL_buffinit(L, &b); + for (; i < last; i++) { + addfield(L, &b, i); + luaL_addlstring(&b, sep, lsep); + } + if (i == last) /* add last value (if interval was not empty) */ + addfield(L, &b, i); + luaL_pushresult(&b); + return 1; +} + + +/* +** {====================================================== +** Pack/unpack +** ======================================================= +*/ + +static int pack (lua_State *L) { + int n = lua_gettop(L); /* number of elements to pack */ + lua_createtable(L, n, 1); /* create result table */ + lua_pushinteger(L, n); + lua_setfield(L, -2, "n"); /* t.n = number of elements */ + if (n > 0) { /* at least one element? */ + int i; + lua_pushvalue(L, 1); + lua_rawseti(L, -2, 1); /* insert first element */ + lua_replace(L, 1); /* move table into index 1 */ + for (i = n; i >= 2; i--) /* assign other elements */ + lua_rawseti(L, 1, i); + } + return 1; /* return table */ +} + + +static int unpack (lua_State *L) { + int i, e, n; + luaL_checktype(L, 1, LUA_TTABLE); + i = luaL_optint(L, 2, 1); + e = luaL_opt(L, luaL_checkint, 3, luaL_len(L, 1)); + if (i > e) return 0; /* empty range */ + n = e - i + 1; /* number of elements */ + if (n <= 0 || !lua_checkstack(L, n)) /* n <= 0 means arith. overflow */ + return luaL_error(L, "too many results to unpack"); + lua_rawgeti(L, 1, i); /* push arg[i] (avoiding overflow problems) */ + while (i++ < e) /* push arg[i + 1...e] */ + lua_rawgeti(L, 1, i); + return n; +} + +/* }====================================================== */ + + + +/* +** {====================================================== +** Quicksort +** (based on `Algorithms in MODULA-3', Robert Sedgewick; +** Addison-Wesley, 1993.) +** ======================================================= +*/ + + +static void set2 (lua_State *L, int i, int j) { + lua_rawseti(L, 1, i); + lua_rawseti(L, 1, j); +} + +static int sort_comp (lua_State *L, int a, int b) { + if (!lua_isnil(L, 2)) { /* function? */ + int res; + lua_pushvalue(L, 2); + lua_pushvalue(L, a-1); /* -1 to compensate function */ + lua_pushvalue(L, b-2); /* -2 to compensate function and `a' */ + lua_call(L, 2, 1); + res = lua_toboolean(L, -1); + lua_pop(L, 1); + return res; + } + else /* a < b? */ + return lua_compare(L, a, b, LUA_OPLT); +} + +static void auxsort (lua_State *L, int l, int u) { + while (l < u) { /* for tail recursion */ + int i, j; + /* sort elements a[l], a[(l+u)/2] and a[u] */ + lua_rawgeti(L, 1, l); + lua_rawgeti(L, 1, u); + if (sort_comp(L, -1, -2)) /* a[u] < a[l]? */ + set2(L, l, u); /* swap a[l] - a[u] */ + else + lua_pop(L, 2); + if (u-l == 1) break; /* only 2 elements */ + i = (l+u)/2; + lua_rawgeti(L, 1, i); + lua_rawgeti(L, 1, l); + if (sort_comp(L, -2, -1)) /* a[i]= P */ + while (lua_rawgeti(L, 1, ++i), sort_comp(L, -1, -2)) { + if (i>=u) luaL_error(L, "invalid order function for sorting"); + lua_pop(L, 1); /* remove a[i] */ + } + /* repeat --j until a[j] <= P */ + while (lua_rawgeti(L, 1, --j), sort_comp(L, -3, -1)) { + if (j<=l) luaL_error(L, "invalid order function for sorting"); + lua_pop(L, 1); /* remove a[j] */ + } + if (j + +#define ltm_c +#define LUA_CORE + +#include "lua.h" + +#include "lobject.h" +#include "lstate.h" +#include "lstring.h" +#include "ltable.h" +#include "ltm.h" + + +static const char udatatypename[] = "userdata"; + +LUAI_DDEF const char *const luaT_typenames_[LUA_TOTALTAGS] = { + "no value", + "nil", "boolean", udatatypename, "number", + "string", "table", "function", udatatypename, "thread", + "proto", "upval" /* these last two cases are used for tests only */ +}; + + +void luaT_init (lua_State *L) { + static const char *const luaT_eventname[] = { /* ORDER TM */ + "__index", "__newindex", + "__gc", "__mode", "__len", "__eq", + "__add", "__sub", "__mul", "__div", "__mod", + "__pow", "__unm", "__lt", "__le", + "__concat", "__call" + }; + int i; + for (i=0; itmname[i] = luaS_new(L, luaT_eventname[i]); + luaS_fix(G(L)->tmname[i]); /* never collect these names */ + } +} + + +/* +** function to be used with macro "fasttm": optimized for absence of +** tag methods +*/ +const TValue *luaT_gettm (Table *events, TMS event, TString *ename) { + const TValue *tm = luaH_getstr(events, ename); + lua_assert(event <= TM_EQ); + if (ttisnil(tm)) { /* no tag method? */ + events->flags |= cast_byte(1u<metatable; + break; + case LUA_TUSERDATA: + mt = uvalue(o)->metatable; + break; + default: + mt = G(L)->mt[ttypenv(o)]; + } + return (mt ? luaH_getstr(mt, G(L)->tmname[event]) : luaO_nilobject); +} + diff --git a/ext/lua/src/lundump.c b/ext/lua/src/lundump.c new file mode 100644 index 000000000..54de011a4 --- /dev/null +++ b/ext/lua/src/lundump.c @@ -0,0 +1,258 @@ +/* +** $Id: lundump.c,v 2.22 2012/05/08 13:53:33 roberto Exp $ +** load precompiled Lua chunks +** See Copyright Notice in lua.h +*/ + +#include + +#define lundump_c +#define LUA_CORE + +#include "lua.h" + +#include "ldebug.h" +#include "ldo.h" +#include "lfunc.h" +#include "lmem.h" +#include "lobject.h" +#include "lstring.h" +#include "lundump.h" +#include "lzio.h" + +typedef struct { + lua_State* L; + ZIO* Z; + Mbuffer* b; + const char* name; +} LoadState; + +static l_noret error(LoadState* S, const char* why) +{ + luaO_pushfstring(S->L,"%s: %s precompiled chunk",S->name,why); + luaD_throw(S->L,LUA_ERRSYNTAX); +} + +#define LoadMem(S,b,n,size) LoadBlock(S,b,(n)*(size)) +#define LoadByte(S) (lu_byte)LoadChar(S) +#define LoadVar(S,x) LoadMem(S,&x,1,sizeof(x)) +#define LoadVector(S,b,n,size) LoadMem(S,b,n,size) + +#if !defined(luai_verifycode) +#define luai_verifycode(L,b,f) /* empty */ +#endif + +static void LoadBlock(LoadState* S, void* b, size_t size) +{ + if (luaZ_read(S->Z,b,size)!=0) error(S,"truncated"); +} + +static int LoadChar(LoadState* S) +{ + char x; + LoadVar(S,x); + return x; +} + +static int LoadInt(LoadState* S) +{ + int x; + LoadVar(S,x); + if (x<0) error(S,"corrupted"); + return x; +} + +static lua_Number LoadNumber(LoadState* S) +{ + lua_Number x; + LoadVar(S,x); + return x; +} + +static TString* LoadString(LoadState* S) +{ + size_t size; + LoadVar(S,size); + if (size==0) + return NULL; + else + { + char* s=luaZ_openspace(S->L,S->b,size); + LoadBlock(S,s,size*sizeof(char)); + return luaS_newlstr(S->L,s,size-1); /* remove trailing '\0' */ + } +} + +static void LoadCode(LoadState* S, Proto* f) +{ + int n=LoadInt(S); + f->code=luaM_newvector(S->L,n,Instruction); + f->sizecode=n; + LoadVector(S,f->code,n,sizeof(Instruction)); +} + +static void LoadFunction(LoadState* S, Proto* f); + +static void LoadConstants(LoadState* S, Proto* f) +{ + int i,n; + n=LoadInt(S); + f->k=luaM_newvector(S->L,n,TValue); + f->sizek=n; + for (i=0; ik[i]); + for (i=0; ik[i]; + int t=LoadChar(S); + switch (t) + { + case LUA_TNIL: + setnilvalue(o); + break; + case LUA_TBOOLEAN: + setbvalue(o,LoadChar(S)); + break; + case LUA_TNUMBER: + setnvalue(o,LoadNumber(S)); + break; + case LUA_TSTRING: + setsvalue2n(S->L,o,LoadString(S)); + break; + default: lua_assert(0); + } + } + n=LoadInt(S); + f->p=luaM_newvector(S->L,n,Proto*); + f->sizep=n; + for (i=0; ip[i]=NULL; + for (i=0; ip[i]=luaF_newproto(S->L); + LoadFunction(S,f->p[i]); + } +} + +static void LoadUpvalues(LoadState* S, Proto* f) +{ + int i,n; + n=LoadInt(S); + f->upvalues=luaM_newvector(S->L,n,Upvaldesc); + f->sizeupvalues=n; + for (i=0; iupvalues[i].name=NULL; + for (i=0; iupvalues[i].instack=LoadByte(S); + f->upvalues[i].idx=LoadByte(S); + } +} + +static void LoadDebug(LoadState* S, Proto* f) +{ + int i,n; + f->source=LoadString(S); + n=LoadInt(S); + f->lineinfo=luaM_newvector(S->L,n,int); + f->sizelineinfo=n; + LoadVector(S,f->lineinfo,n,sizeof(int)); + n=LoadInt(S); + f->locvars=luaM_newvector(S->L,n,LocVar); + f->sizelocvars=n; + for (i=0; ilocvars[i].varname=NULL; + for (i=0; ilocvars[i].varname=LoadString(S); + f->locvars[i].startpc=LoadInt(S); + f->locvars[i].endpc=LoadInt(S); + } + n=LoadInt(S); + for (i=0; iupvalues[i].name=LoadString(S); +} + +static void LoadFunction(LoadState* S, Proto* f) +{ + f->linedefined=LoadInt(S); + f->lastlinedefined=LoadInt(S); + f->numparams=LoadByte(S); + f->is_vararg=LoadByte(S); + f->maxstacksize=LoadByte(S); + LoadCode(S,f); + LoadConstants(S,f); + LoadUpvalues(S,f); + LoadDebug(S,f); +} + +/* the code below must be consistent with the code in luaU_header */ +#define N0 LUAC_HEADERSIZE +#define N1 (sizeof(LUA_SIGNATURE)-sizeof(char)) +#define N2 N1+2 +#define N3 N2+6 + +static void LoadHeader(LoadState* S) +{ + lu_byte h[LUAC_HEADERSIZE]; + lu_byte s[LUAC_HEADERSIZE]; + luaU_header(h); + memcpy(s,h,sizeof(char)); /* first char already read */ + LoadBlock(S,s+sizeof(char),LUAC_HEADERSIZE-sizeof(char)); + if (memcmp(h,s,N0)==0) return; + if (memcmp(h,s,N1)!=0) error(S,"not a"); + if (memcmp(h,s,N2)!=0) error(S,"version mismatch in"); + if (memcmp(h,s,N3)!=0) error(S,"incompatible"); else error(S,"corrupted"); +} + +/* +** load precompiled chunk +*/ +Closure* luaU_undump (lua_State* L, ZIO* Z, Mbuffer* buff, const char* name) +{ + LoadState S; + Closure* cl; + if (*name=='@' || *name=='=') + S.name=name+1; + else if (*name==LUA_SIGNATURE[0]) + S.name="binary string"; + else + S.name=name; + S.L=L; + S.Z=Z; + S.b=buff; + LoadHeader(&S); + cl=luaF_newLclosure(L,1); + setclLvalue(L,L->top,cl); incr_top(L); + cl->l.p=luaF_newproto(L); + LoadFunction(&S,cl->l.p); + if (cl->l.p->sizeupvalues != 1) + { + Proto* p=cl->l.p; + cl=luaF_newLclosure(L,cl->l.p->sizeupvalues); + cl->l.p=p; + setclLvalue(L,L->top-1,cl); + } + luai_verifycode(L,buff,cl->l.p); + return cl; +} + +#define MYINT(s) (s[0]-'0') +#define VERSION MYINT(LUA_VERSION_MAJOR)*16+MYINT(LUA_VERSION_MINOR) +#define FORMAT 0 /* this is the official format */ + +/* +* make header for precompiled chunks +* if you change the code below be sure to update LoadHeader and FORMAT above +* and LUAC_HEADERSIZE in lundump.h +*/ +void luaU_header (lu_byte* h) +{ + int x=1; + memcpy(h,LUA_SIGNATURE,sizeof(LUA_SIGNATURE)-sizeof(char)); + h+=sizeof(LUA_SIGNATURE)-sizeof(char); + *h++=cast_byte(VERSION); + *h++=cast_byte(FORMAT); + *h++=cast_byte(*(char*)&x); /* endianness */ + *h++=cast_byte(sizeof(int)); + *h++=cast_byte(sizeof(size_t)); + *h++=cast_byte(sizeof(Instruction)); + *h++=cast_byte(sizeof(lua_Number)); + *h++=cast_byte(((lua_Number)0.5)==0); /* is lua_Number integral? */ + memcpy(h,LUAC_TAIL,sizeof(LUAC_TAIL)-sizeof(char)); +} diff --git a/ext/lua/src/lvm.c b/ext/lua/src/lvm.c new file mode 100644 index 000000000..657d5c456 --- /dev/null +++ b/ext/lua/src/lvm.c @@ -0,0 +1,867 @@ +/* +** $Id: lvm.c,v 2.155 2013/03/16 21:10:18 roberto Exp $ +** Lua virtual machine +** See Copyright Notice in lua.h +*/ + + +#include +#include +#include + +#define lvm_c +#define LUA_CORE + +#include "lua.h" + +#include "ldebug.h" +#include "ldo.h" +#include "lfunc.h" +#include "lgc.h" +#include "lobject.h" +#include "lopcodes.h" +#include "lstate.h" +#include "lstring.h" +#include "ltable.h" +#include "ltm.h" +#include "lvm.h" + + + +/* limit for table tag-method chains (to avoid loops) */ +#define MAXTAGLOOP 100 + + +const TValue *luaV_tonumber (const TValue *obj, TValue *n) { + lua_Number num; + if (ttisnumber(obj)) return obj; + if (ttisstring(obj) && luaO_str2d(svalue(obj), tsvalue(obj)->len, &num)) { + setnvalue(n, num); + return n; + } + else + return NULL; +} + + +int luaV_tostring (lua_State *L, StkId obj) { + if (!ttisnumber(obj)) + return 0; + else { + char s[LUAI_MAXNUMBER2STR]; + lua_Number n = nvalue(obj); + int l = lua_number2str(s, n); + setsvalue2s(L, obj, luaS_newlstr(L, s, l)); + return 1; + } +} + + +static void traceexec (lua_State *L) { + CallInfo *ci = L->ci; + lu_byte mask = L->hookmask; + int counthook = ((mask & LUA_MASKCOUNT) && L->hookcount == 0); + if (counthook) + resethookcount(L); /* reset count */ + if (ci->callstatus & CIST_HOOKYIELD) { /* called hook last time? */ + ci->callstatus &= ~CIST_HOOKYIELD; /* erase mark */ + return; /* do not call hook again (VM yielded, so it did not move) */ + } + if (counthook) + luaD_hook(L, LUA_HOOKCOUNT, -1); /* call count hook */ + if (mask & LUA_MASKLINE) { + Proto *p = ci_func(ci)->p; + int npc = pcRel(ci->u.l.savedpc, p); + int newline = getfuncline(p, npc); + if (npc == 0 || /* call linehook when enter a new function, */ + ci->u.l.savedpc <= L->oldpc || /* when jump back (loop), or when */ + newline != getfuncline(p, pcRel(L->oldpc, p))) /* enter a new line */ + luaD_hook(L, LUA_HOOKLINE, newline); /* call line hook */ + } + L->oldpc = ci->u.l.savedpc; + if (L->status == LUA_YIELD) { /* did hook yield? */ + if (counthook) + L->hookcount = 1; /* undo decrement to zero */ + ci->u.l.savedpc--; /* undo increment (resume will increment it again) */ + ci->callstatus |= CIST_HOOKYIELD; /* mark that it yielded */ + ci->func = L->top - 1; /* protect stack below results */ + luaD_throw(L, LUA_YIELD); + } +} + + +static void callTM (lua_State *L, const TValue *f, const TValue *p1, + const TValue *p2, TValue *p3, int hasres) { + ptrdiff_t result = savestack(L, p3); + setobj2s(L, L->top++, f); /* push function */ + setobj2s(L, L->top++, p1); /* 1st argument */ + setobj2s(L, L->top++, p2); /* 2nd argument */ + if (!hasres) /* no result? 'p3' is third argument */ + setobj2s(L, L->top++, p3); /* 3rd argument */ + /* metamethod may yield only when called from Lua code */ + luaD_call(L, L->top - (4 - hasres), hasres, isLua(L->ci)); + if (hasres) { /* if has result, move it to its place */ + p3 = restorestack(L, result); + setobjs2s(L, p3, --L->top); + } +} + + +void luaV_gettable (lua_State *L, const TValue *t, TValue *key, StkId val) { + int loop; + for (loop = 0; loop < MAXTAGLOOP; loop++) { + const TValue *tm; + if (ttistable(t)) { /* `t' is a table? */ + Table *h = hvalue(t); + const TValue *res = luaH_get(h, key); /* do a primitive get */ + if (!ttisnil(res) || /* result is not nil? */ + (tm = fasttm(L, h->metatable, TM_INDEX)) == NULL) { /* or no TM? */ + setobj2s(L, val, res); + return; + } + /* else will try the tag method */ + } + else if (ttisnil(tm = luaT_gettmbyobj(L, t, TM_INDEX))) + luaG_typeerror(L, t, "index"); + if (ttisfunction(tm)) { + callTM(L, tm, t, key, val, 1); + return; + } + t = tm; /* else repeat with 'tm' */ + } + luaG_runerror(L, "loop in gettable"); +} + + +void luaV_settable (lua_State *L, const TValue *t, TValue *key, StkId val) { + int loop; + for (loop = 0; loop < MAXTAGLOOP; loop++) { + const TValue *tm; + if (ttistable(t)) { /* `t' is a table? */ + Table *h = hvalue(t); + TValue *oldval = cast(TValue *, luaH_get(h, key)); + /* if previous value is not nil, there must be a previous entry + in the table; moreover, a metamethod has no relevance */ + if (!ttisnil(oldval) || + /* previous value is nil; must check the metamethod */ + ((tm = fasttm(L, h->metatable, TM_NEWINDEX)) == NULL && + /* no metamethod; is there a previous entry in the table? */ + (oldval != luaO_nilobject || + /* no previous entry; must create one. (The next test is + always true; we only need the assignment.) */ + (oldval = luaH_newkey(L, h, key), 1)))) { + /* no metamethod and (now) there is an entry with given key */ + setobj2t(L, oldval, val); /* assign new value to that entry */ + invalidateTMcache(h); + luaC_barrierback(L, obj2gco(h), val); + return; + } + /* else will try the metamethod */ + } + else /* not a table; check metamethod */ + if (ttisnil(tm = luaT_gettmbyobj(L, t, TM_NEWINDEX))) + luaG_typeerror(L, t, "index"); + /* there is a metamethod */ + if (ttisfunction(tm)) { + callTM(L, tm, t, key, val, 0); + return; + } + t = tm; /* else repeat with 'tm' */ + } + luaG_runerror(L, "loop in settable"); +} + + +static int call_binTM (lua_State *L, const TValue *p1, const TValue *p2, + StkId res, TMS event) { + const TValue *tm = luaT_gettmbyobj(L, p1, event); /* try first operand */ + if (ttisnil(tm)) + tm = luaT_gettmbyobj(L, p2, event); /* try second operand */ + if (ttisnil(tm)) return 0; + callTM(L, tm, p1, p2, res, 1); + return 1; +} + + +static const TValue *get_equalTM (lua_State *L, Table *mt1, Table *mt2, + TMS event) { + const TValue *tm1 = fasttm(L, mt1, event); + const TValue *tm2; + if (tm1 == NULL) return NULL; /* no metamethod */ + if (mt1 == mt2) return tm1; /* same metatables => same metamethods */ + tm2 = fasttm(L, mt2, event); + if (tm2 == NULL) return NULL; /* no metamethod */ + if (luaV_rawequalobj(tm1, tm2)) /* same metamethods? */ + return tm1; + return NULL; +} + + +static int call_orderTM (lua_State *L, const TValue *p1, const TValue *p2, + TMS event) { + if (!call_binTM(L, p1, p2, L->top, event)) + return -1; /* no metamethod */ + else + return !l_isfalse(L->top); +} + + +static int l_strcmp (const TString *ls, const TString *rs) { + const char *l = getstr(ls); + size_t ll = ls->tsv.len; + const char *r = getstr(rs); + size_t lr = rs->tsv.len; + for (;;) { + int temp = strcoll(l, r); + if (temp != 0) return temp; + else { /* strings are equal up to a `\0' */ + size_t len = strlen(l); /* index of first `\0' in both strings */ + if (len == lr) /* r is finished? */ + return (len == ll) ? 0 : 1; + else if (len == ll) /* l is finished? */ + return -1; /* l is smaller than r (because r is not finished) */ + /* both strings longer than `len'; go on comparing (after the `\0') */ + len++; + l += len; ll -= len; r += len; lr -= len; + } + } +} + + +int luaV_lessthan (lua_State *L, const TValue *l, const TValue *r) { + int res; + if (ttisnumber(l) && ttisnumber(r)) + return luai_numlt(L, nvalue(l), nvalue(r)); + else if (ttisstring(l) && ttisstring(r)) + return l_strcmp(rawtsvalue(l), rawtsvalue(r)) < 0; + else if ((res = call_orderTM(L, l, r, TM_LT)) < 0) + luaG_ordererror(L, l, r); + return res; +} + + +int luaV_lessequal (lua_State *L, const TValue *l, const TValue *r) { + int res; + if (ttisnumber(l) && ttisnumber(r)) + return luai_numle(L, nvalue(l), nvalue(r)); + else if (ttisstring(l) && ttisstring(r)) + return l_strcmp(rawtsvalue(l), rawtsvalue(r)) <= 0; + else if ((res = call_orderTM(L, l, r, TM_LE)) >= 0) /* first try `le' */ + return res; + else if ((res = call_orderTM(L, r, l, TM_LT)) < 0) /* else try `lt' */ + luaG_ordererror(L, l, r); + return !res; +} + + +/* +** equality of Lua values. L == NULL means raw equality (no metamethods) +*/ +int luaV_equalobj_ (lua_State *L, const TValue *t1, const TValue *t2) { + const TValue *tm; + lua_assert(ttisequal(t1, t2)); + switch (ttype(t1)) { + case LUA_TNIL: return 1; + case LUA_TNUMBER: return luai_numeq(nvalue(t1), nvalue(t2)); + case LUA_TBOOLEAN: return bvalue(t1) == bvalue(t2); /* true must be 1 !! */ + case LUA_TLIGHTUSERDATA: return pvalue(t1) == pvalue(t2); + case LUA_TLCF: return fvalue(t1) == fvalue(t2); + case LUA_TSHRSTR: return eqshrstr(rawtsvalue(t1), rawtsvalue(t2)); + case LUA_TLNGSTR: return luaS_eqlngstr(rawtsvalue(t1), rawtsvalue(t2)); + case LUA_TUSERDATA: { + if (uvalue(t1) == uvalue(t2)) return 1; + else if (L == NULL) return 0; + tm = get_equalTM(L, uvalue(t1)->metatable, uvalue(t2)->metatable, TM_EQ); + break; /* will try TM */ + } + case LUA_TTABLE: { + if (hvalue(t1) == hvalue(t2)) return 1; + else if (L == NULL) return 0; + tm = get_equalTM(L, hvalue(t1)->metatable, hvalue(t2)->metatable, TM_EQ); + break; /* will try TM */ + } + default: + lua_assert(iscollectable(t1)); + return gcvalue(t1) == gcvalue(t2); + } + if (tm == NULL) return 0; /* no TM? */ + callTM(L, tm, t1, t2, L->top, 1); /* call TM */ + return !l_isfalse(L->top); +} + + +void luaV_concat (lua_State *L, int total) { + lua_assert(total >= 2); + do { + StkId top = L->top; + int n = 2; /* number of elements handled in this pass (at least 2) */ + if (!(ttisstring(top-2) || ttisnumber(top-2)) || !tostring(L, top-1)) { + if (!call_binTM(L, top-2, top-1, top-2, TM_CONCAT)) + luaG_concaterror(L, top-2, top-1); + } + else if (tsvalue(top-1)->len == 0) /* second operand is empty? */ + (void)tostring(L, top - 2); /* result is first operand */ + else if (ttisstring(top-2) && tsvalue(top-2)->len == 0) { + setobjs2s(L, top - 2, top - 1); /* result is second op. */ + } + else { + /* at least two non-empty string values; get as many as possible */ + size_t tl = tsvalue(top-1)->len; + char *buffer; + int i; + /* collect total length */ + for (i = 1; i < total && tostring(L, top-i-1); i++) { + size_t l = tsvalue(top-i-1)->len; + if (l >= (MAX_SIZET/sizeof(char)) - tl) + luaG_runerror(L, "string length overflow"); + tl += l; + } + buffer = luaZ_openspace(L, &G(L)->buff, tl); + tl = 0; + n = i; + do { /* concat all strings */ + size_t l = tsvalue(top-i)->len; + memcpy(buffer+tl, svalue(top-i), l * sizeof(char)); + tl += l; + } while (--i > 0); + setsvalue2s(L, top-n, luaS_newlstr(L, buffer, tl)); + } + total -= n-1; /* got 'n' strings to create 1 new */ + L->top -= n-1; /* popped 'n' strings and pushed one */ + } while (total > 1); /* repeat until only 1 result left */ +} + + +void luaV_objlen (lua_State *L, StkId ra, const TValue *rb) { + const TValue *tm; + switch (ttypenv(rb)) { + case LUA_TTABLE: { + Table *h = hvalue(rb); + tm = fasttm(L, h->metatable, TM_LEN); + if (tm) break; /* metamethod? break switch to call it */ + setnvalue(ra, cast_num(luaH_getn(h))); /* else primitive len */ + return; + } + case LUA_TSTRING: { + setnvalue(ra, cast_num(tsvalue(rb)->len)); + return; + } + default: { /* try metamethod */ + tm = luaT_gettmbyobj(L, rb, TM_LEN); + if (ttisnil(tm)) /* no metamethod? */ + luaG_typeerror(L, rb, "get length of"); + break; + } + } + callTM(L, tm, rb, rb, ra, 1); +} + + +void luaV_arith (lua_State *L, StkId ra, const TValue *rb, + const TValue *rc, TMS op) { + TValue tempb, tempc; + const TValue *b, *c; + if ((b = luaV_tonumber(rb, &tempb)) != NULL && + (c = luaV_tonumber(rc, &tempc)) != NULL) { + lua_Number res = luaO_arith(op - TM_ADD + LUA_OPADD, nvalue(b), nvalue(c)); + setnvalue(ra, res); + } + else if (!call_binTM(L, rb, rc, ra, op)) + luaG_aritherror(L, rb, rc); +} + + +/* +** check whether cached closure in prototype 'p' may be reused, that is, +** whether there is a cached closure with the same upvalues needed by +** new closure to be created. +*/ +static Closure *getcached (Proto *p, UpVal **encup, StkId base) { + Closure *c = p->cache; + if (c != NULL) { /* is there a cached closure? */ + int nup = p->sizeupvalues; + Upvaldesc *uv = p->upvalues; + int i; + for (i = 0; i < nup; i++) { /* check whether it has right upvalues */ + TValue *v = uv[i].instack ? base + uv[i].idx : encup[uv[i].idx]->v; + if (c->l.upvals[i]->v != v) + return NULL; /* wrong upvalue; cannot reuse closure */ + } + } + return c; /* return cached closure (or NULL if no cached closure) */ +} + + +/* +** create a new Lua closure, push it in the stack, and initialize +** its upvalues. Note that the call to 'luaC_barrierproto' must come +** before the assignment to 'p->cache', as the function needs the +** original value of that field. +*/ +static void pushclosure (lua_State *L, Proto *p, UpVal **encup, StkId base, + StkId ra) { + int nup = p->sizeupvalues; + Upvaldesc *uv = p->upvalues; + int i; + Closure *ncl = luaF_newLclosure(L, nup); + ncl->l.p = p; + setclLvalue(L, ra, ncl); /* anchor new closure in stack */ + for (i = 0; i < nup; i++) { /* fill in its upvalues */ + if (uv[i].instack) /* upvalue refers to local variable? */ + ncl->l.upvals[i] = luaF_findupval(L, base + uv[i].idx); + else /* get upvalue from enclosing function */ + ncl->l.upvals[i] = encup[uv[i].idx]; + } + luaC_barrierproto(L, p, ncl); + p->cache = ncl; /* save it on cache for reuse */ +} + + +/* +** finish execution of an opcode interrupted by an yield +*/ +void luaV_finishOp (lua_State *L) { + CallInfo *ci = L->ci; + StkId base = ci->u.l.base; + Instruction inst = *(ci->u.l.savedpc - 1); /* interrupted instruction */ + OpCode op = GET_OPCODE(inst); + switch (op) { /* finish its execution */ + case OP_ADD: case OP_SUB: case OP_MUL: case OP_DIV: + case OP_MOD: case OP_POW: case OP_UNM: case OP_LEN: + case OP_GETTABUP: case OP_GETTABLE: case OP_SELF: { + setobjs2s(L, base + GETARG_A(inst), --L->top); + break; + } + case OP_LE: case OP_LT: case OP_EQ: { + int res = !l_isfalse(L->top - 1); + L->top--; + /* metamethod should not be called when operand is K */ + lua_assert(!ISK(GETARG_B(inst))); + if (op == OP_LE && /* "<=" using "<" instead? */ + ttisnil(luaT_gettmbyobj(L, base + GETARG_B(inst), TM_LE))) + res = !res; /* invert result */ + lua_assert(GET_OPCODE(*ci->u.l.savedpc) == OP_JMP); + if (res != GETARG_A(inst)) /* condition failed? */ + ci->u.l.savedpc++; /* skip jump instruction */ + break; + } + case OP_CONCAT: { + StkId top = L->top - 1; /* top when 'call_binTM' was called */ + int b = GETARG_B(inst); /* first element to concatenate */ + int total = cast_int(top - 1 - (base + b)); /* yet to concatenate */ + setobj2s(L, top - 2, top); /* put TM result in proper position */ + if (total > 1) { /* are there elements to concat? */ + L->top = top - 1; /* top is one after last element (at top-2) */ + luaV_concat(L, total); /* concat them (may yield again) */ + } + /* move final result to final position */ + setobj2s(L, ci->u.l.base + GETARG_A(inst), L->top - 1); + L->top = ci->top; /* restore top */ + break; + } + case OP_TFORCALL: { + lua_assert(GET_OPCODE(*ci->u.l.savedpc) == OP_TFORLOOP); + L->top = ci->top; /* correct top */ + break; + } + case OP_CALL: { + if (GETARG_C(inst) - 1 >= 0) /* nresults >= 0? */ + L->top = ci->top; /* adjust results */ + break; + } + case OP_TAILCALL: case OP_SETTABUP: case OP_SETTABLE: + break; + default: lua_assert(0); + } +} + + + +/* +** some macros for common tasks in `luaV_execute' +*/ + +#if !defined luai_runtimecheck +#define luai_runtimecheck(L, c) /* void */ +#endif + + +#define RA(i) (base+GETARG_A(i)) +/* to be used after possible stack reallocation */ +#define RB(i) check_exp(getBMode(GET_OPCODE(i)) == OpArgR, base+GETARG_B(i)) +#define RC(i) check_exp(getCMode(GET_OPCODE(i)) == OpArgR, base+GETARG_C(i)) +#define RKB(i) check_exp(getBMode(GET_OPCODE(i)) == OpArgK, \ + ISK(GETARG_B(i)) ? k+INDEXK(GETARG_B(i)) : base+GETARG_B(i)) +#define RKC(i) check_exp(getCMode(GET_OPCODE(i)) == OpArgK, \ + ISK(GETARG_C(i)) ? k+INDEXK(GETARG_C(i)) : base+GETARG_C(i)) +#define KBx(i) \ + (k + (GETARG_Bx(i) != 0 ? GETARG_Bx(i) - 1 : GETARG_Ax(*ci->u.l.savedpc++))) + + +/* execute a jump instruction */ +#define dojump(ci,i,e) \ + { int a = GETARG_A(i); \ + if (a > 0) luaF_close(L, ci->u.l.base + a - 1); \ + ci->u.l.savedpc += GETARG_sBx(i) + e; } + +/* for test instructions, execute the jump instruction that follows it */ +#define donextjump(ci) { i = *ci->u.l.savedpc; dojump(ci, i, 1); } + + +#define Protect(x) { {x;}; base = ci->u.l.base; } + +#define checkGC(L,c) \ + Protect( luaC_condGC(L,{L->top = (c); /* limit of live values */ \ + luaC_step(L); \ + L->top = ci->top;}) /* restore top */ \ + luai_threadyield(L); ) + + +#define arith_op(op,tm) { \ + TValue *rb = RKB(i); \ + TValue *rc = RKC(i); \ + if (ttisnumber(rb) && ttisnumber(rc)) { \ + lua_Number nb = nvalue(rb), nc = nvalue(rc); \ + setnvalue(ra, op(L, nb, nc)); \ + } \ + else { Protect(luaV_arith(L, ra, rb, rc, tm)); } } + + +#define vmdispatch(o) switch(o) +#define vmcase(l,b) case l: {b} break; +#define vmcasenb(l,b) case l: {b} /* nb = no break */ + +void luaV_execute (lua_State *L) { + CallInfo *ci = L->ci; + LClosure *cl; + TValue *k; + StkId base; + newframe: /* reentry point when frame changes (call/return) */ + lua_assert(ci == L->ci); + cl = clLvalue(ci->func); + k = cl->p->k; + base = ci->u.l.base; + /* main loop of interpreter */ + for (;;) { + Instruction i = *(ci->u.l.savedpc++); + StkId ra; + if ((L->hookmask & (LUA_MASKLINE | LUA_MASKCOUNT)) && + (--L->hookcount == 0 || L->hookmask & LUA_MASKLINE)) { + Protect(traceexec(L)); + } + /* WARNING: several calls may realloc the stack and invalidate `ra' */ + ra = RA(i); + lua_assert(base == ci->u.l.base); + lua_assert(base <= L->top && L->top < L->stack + L->stacksize); + vmdispatch (GET_OPCODE(i)) { + vmcase(OP_MOVE, + setobjs2s(L, ra, RB(i)); + ) + vmcase(OP_LOADK, + TValue *rb = k + GETARG_Bx(i); + setobj2s(L, ra, rb); + ) + vmcase(OP_LOADKX, + TValue *rb; + lua_assert(GET_OPCODE(*ci->u.l.savedpc) == OP_EXTRAARG); + rb = k + GETARG_Ax(*ci->u.l.savedpc++); + setobj2s(L, ra, rb); + ) + vmcase(OP_LOADBOOL, + setbvalue(ra, GETARG_B(i)); + if (GETARG_C(i)) ci->u.l.savedpc++; /* skip next instruction (if C) */ + ) + vmcase(OP_LOADNIL, + int b = GETARG_B(i); + do { + setnilvalue(ra++); + } while (b--); + ) + vmcase(OP_GETUPVAL, + int b = GETARG_B(i); + setobj2s(L, ra, cl->upvals[b]->v); + ) + vmcase(OP_GETTABUP, + int b = GETARG_B(i); + Protect(luaV_gettable(L, cl->upvals[b]->v, RKC(i), ra)); + ) + vmcase(OP_GETTABLE, + Protect(luaV_gettable(L, RB(i), RKC(i), ra)); + ) + vmcase(OP_SETTABUP, + int a = GETARG_A(i); + Protect(luaV_settable(L, cl->upvals[a]->v, RKB(i), RKC(i))); + ) + vmcase(OP_SETUPVAL, + UpVal *uv = cl->upvals[GETARG_B(i)]; + setobj(L, uv->v, ra); + luaC_barrier(L, uv, ra); + ) + vmcase(OP_SETTABLE, + Protect(luaV_settable(L, ra, RKB(i), RKC(i))); + ) + vmcase(OP_NEWTABLE, + int b = GETARG_B(i); + int c = GETARG_C(i); + Table *t = luaH_new(L); + sethvalue(L, ra, t); + if (b != 0 || c != 0) + luaH_resize(L, t, luaO_fb2int(b), luaO_fb2int(c)); + checkGC(L, ra + 1); + ) + vmcase(OP_SELF, + StkId rb = RB(i); + setobjs2s(L, ra+1, rb); + Protect(luaV_gettable(L, rb, RKC(i), ra)); + ) + vmcase(OP_ADD, + arith_op(luai_numadd, TM_ADD); + ) + vmcase(OP_SUB, + arith_op(luai_numsub, TM_SUB); + ) + vmcase(OP_MUL, + arith_op(luai_nummul, TM_MUL); + ) + vmcase(OP_DIV, + arith_op(luai_numdiv, TM_DIV); + ) + vmcase(OP_MOD, + arith_op(luai_nummod, TM_MOD); + ) + vmcase(OP_POW, + arith_op(luai_numpow, TM_POW); + ) + vmcase(OP_UNM, + TValue *rb = RB(i); + if (ttisnumber(rb)) { + lua_Number nb = nvalue(rb); + setnvalue(ra, luai_numunm(L, nb)); + } + else { + Protect(luaV_arith(L, ra, rb, rb, TM_UNM)); + } + ) + vmcase(OP_NOT, + TValue *rb = RB(i); + int res = l_isfalse(rb); /* next assignment may change this value */ + setbvalue(ra, res); + ) + vmcase(OP_LEN, + Protect(luaV_objlen(L, ra, RB(i))); + ) + vmcase(OP_CONCAT, + int b = GETARG_B(i); + int c = GETARG_C(i); + StkId rb; + L->top = base + c + 1; /* mark the end of concat operands */ + Protect(luaV_concat(L, c - b + 1)); + ra = RA(i); /* 'luav_concat' may invoke TMs and move the stack */ + rb = b + base; + setobjs2s(L, ra, rb); + checkGC(L, (ra >= rb ? ra + 1 : rb)); + L->top = ci->top; /* restore top */ + ) + vmcase(OP_JMP, + dojump(ci, i, 0); + ) + vmcase(OP_EQ, + TValue *rb = RKB(i); + TValue *rc = RKC(i); + Protect( + if (cast_int(equalobj(L, rb, rc)) != GETARG_A(i)) + ci->u.l.savedpc++; + else + donextjump(ci); + ) + ) + vmcase(OP_LT, + Protect( + if (luaV_lessthan(L, RKB(i), RKC(i)) != GETARG_A(i)) + ci->u.l.savedpc++; + else + donextjump(ci); + ) + ) + vmcase(OP_LE, + Protect( + if (luaV_lessequal(L, RKB(i), RKC(i)) != GETARG_A(i)) + ci->u.l.savedpc++; + else + donextjump(ci); + ) + ) + vmcase(OP_TEST, + if (GETARG_C(i) ? l_isfalse(ra) : !l_isfalse(ra)) + ci->u.l.savedpc++; + else + donextjump(ci); + ) + vmcase(OP_TESTSET, + TValue *rb = RB(i); + if (GETARG_C(i) ? l_isfalse(rb) : !l_isfalse(rb)) + ci->u.l.savedpc++; + else { + setobjs2s(L, ra, rb); + donextjump(ci); + } + ) + vmcase(OP_CALL, + int b = GETARG_B(i); + int nresults = GETARG_C(i) - 1; + if (b != 0) L->top = ra+b; /* else previous instruction set top */ + if (luaD_precall(L, ra, nresults)) { /* C function? */ + if (nresults >= 0) L->top = ci->top; /* adjust results */ + base = ci->u.l.base; + } + else { /* Lua function */ + ci = L->ci; + ci->callstatus |= CIST_REENTRY; + goto newframe; /* restart luaV_execute over new Lua function */ + } + ) + vmcase(OP_TAILCALL, + int b = GETARG_B(i); + if (b != 0) L->top = ra+b; /* else previous instruction set top */ + lua_assert(GETARG_C(i) - 1 == LUA_MULTRET); + if (luaD_precall(L, ra, LUA_MULTRET)) /* C function? */ + base = ci->u.l.base; + else { + /* tail call: put called frame (n) in place of caller one (o) */ + CallInfo *nci = L->ci; /* called frame */ + CallInfo *oci = nci->previous; /* caller frame */ + StkId nfunc = nci->func; /* called function */ + StkId ofunc = oci->func; /* caller function */ + /* last stack slot filled by 'precall' */ + StkId lim = nci->u.l.base + getproto(nfunc)->numparams; + int aux; + /* close all upvalues from previous call */ + if (cl->p->sizep > 0) luaF_close(L, oci->u.l.base); + /* move new frame into old one */ + for (aux = 0; nfunc + aux < lim; aux++) + setobjs2s(L, ofunc + aux, nfunc + aux); + oci->u.l.base = ofunc + (nci->u.l.base - nfunc); /* correct base */ + oci->top = L->top = ofunc + (L->top - nfunc); /* correct top */ + oci->u.l.savedpc = nci->u.l.savedpc; + oci->callstatus |= CIST_TAIL; /* function was tail called */ + ci = L->ci = oci; /* remove new frame */ + lua_assert(L->top == oci->u.l.base + getproto(ofunc)->maxstacksize); + goto newframe; /* restart luaV_execute over new Lua function */ + } + ) + vmcasenb(OP_RETURN, + int b = GETARG_B(i); + if (b != 0) L->top = ra+b-1; + if (cl->p->sizep > 0) luaF_close(L, base); + b = luaD_poscall(L, ra); + if (!(ci->callstatus & CIST_REENTRY)) /* 'ci' still the called one */ + return; /* external invocation: return */ + else { /* invocation via reentry: continue execution */ + ci = L->ci; + if (b) L->top = ci->top; + lua_assert(isLua(ci)); + lua_assert(GET_OPCODE(*((ci)->u.l.savedpc - 1)) == OP_CALL); + goto newframe; /* restart luaV_execute over new Lua function */ + } + ) + vmcase(OP_FORLOOP, + lua_Number step = nvalue(ra+2); + lua_Number idx = luai_numadd(L, nvalue(ra), step); /* increment index */ + lua_Number limit = nvalue(ra+1); + if (luai_numlt(L, 0, step) ? luai_numle(L, idx, limit) + : luai_numle(L, limit, idx)) { + ci->u.l.savedpc += GETARG_sBx(i); /* jump back */ + setnvalue(ra, idx); /* update internal index... */ + setnvalue(ra+3, idx); /* ...and external index */ + } + ) + vmcase(OP_FORPREP, + const TValue *init = ra; + const TValue *plimit = ra+1; + const TValue *pstep = ra+2; + if (!tonumber(init, ra)) + luaG_runerror(L, LUA_QL("for") " initial value must be a number"); + else if (!tonumber(plimit, ra+1)) + luaG_runerror(L, LUA_QL("for") " limit must be a number"); + else if (!tonumber(pstep, ra+2)) + luaG_runerror(L, LUA_QL("for") " step must be a number"); + setnvalue(ra, luai_numsub(L, nvalue(ra), nvalue(pstep))); + ci->u.l.savedpc += GETARG_sBx(i); + ) + vmcasenb(OP_TFORCALL, + StkId cb = ra + 3; /* call base */ + setobjs2s(L, cb+2, ra+2); + setobjs2s(L, cb+1, ra+1); + setobjs2s(L, cb, ra); + L->top = cb + 3; /* func. + 2 args (state and index) */ + Protect(luaD_call(L, cb, GETARG_C(i), 1)); + L->top = ci->top; + i = *(ci->u.l.savedpc++); /* go to next instruction */ + ra = RA(i); + lua_assert(GET_OPCODE(i) == OP_TFORLOOP); + goto l_tforloop; + ) + vmcase(OP_TFORLOOP, + l_tforloop: + if (!ttisnil(ra + 1)) { /* continue loop? */ + setobjs2s(L, ra, ra + 1); /* save control variable */ + ci->u.l.savedpc += GETARG_sBx(i); /* jump back */ + } + ) + vmcase(OP_SETLIST, + int n = GETARG_B(i); + int c = GETARG_C(i); + int last; + Table *h; + if (n == 0) n = cast_int(L->top - ra) - 1; + if (c == 0) { + lua_assert(GET_OPCODE(*ci->u.l.savedpc) == OP_EXTRAARG); + c = GETARG_Ax(*ci->u.l.savedpc++); + } + luai_runtimecheck(L, ttistable(ra)); + h = hvalue(ra); + last = ((c-1)*LFIELDS_PER_FLUSH) + n; + if (last > h->sizearray) /* needs more space? */ + luaH_resizearray(L, h, last); /* pre-allocate it at once */ + for (; n > 0; n--) { + TValue *val = ra+n; + luaH_setint(L, h, last--, val); + luaC_barrierback(L, obj2gco(h), val); + } + L->top = ci->top; /* correct top (in case of previous open call) */ + ) + vmcase(OP_CLOSURE, + Proto *p = cl->p->p[GETARG_Bx(i)]; + Closure *ncl = getcached(p, cl->upvals, base); /* cached closure */ + if (ncl == NULL) /* no match? */ + pushclosure(L, p, cl->upvals, base, ra); /* create a new one */ + else + setclLvalue(L, ra, ncl); /* push cashed closure */ + checkGC(L, ra + 1); + ) + vmcase(OP_VARARG, + int b = GETARG_B(i) - 1; + int j; + int n = cast_int(base - ci->func) - cl->p->numparams - 1; + if (b < 0) { /* B == 0? */ + b = n; /* get all var. arguments */ + Protect(luaD_checkstack(L, n)); + ra = RA(i); /* previous call may change the stack */ + L->top = ra + n; + } + for (j = 0; j < b; j++) { + if (j < n) { + setobjs2s(L, ra + j, base - n + j); + } + else { + setnilvalue(ra + j); + } + } + ) + vmcase(OP_EXTRAARG, + lua_assert(0); + ) + } + } +} + diff --git a/ext/lua/src/lzio.c b/ext/lua/src/lzio.c new file mode 100644 index 000000000..8b77054e0 --- /dev/null +++ b/ext/lua/src/lzio.c @@ -0,0 +1,76 @@ +/* +** $Id: lzio.c,v 1.35 2012/05/14 13:34:18 roberto Exp $ +** Buffered streams +** See Copyright Notice in lua.h +*/ + + +#include + +#define lzio_c +#define LUA_CORE + +#include "lua.h" + +#include "llimits.h" +#include "lmem.h" +#include "lstate.h" +#include "lzio.h" + + +int luaZ_fill (ZIO *z) { + size_t size; + lua_State *L = z->L; + const char *buff; + lua_unlock(L); + buff = z->reader(L, z->data, &size); + lua_lock(L); + if (buff == NULL || size == 0) + return EOZ; + z->n = size - 1; /* discount char being returned */ + z->p = buff; + return cast_uchar(*(z->p++)); +} + + +void luaZ_init (lua_State *L, ZIO *z, lua_Reader reader, void *data) { + z->L = L; + z->reader = reader; + z->data = data; + z->n = 0; + z->p = NULL; +} + + +/* --------------------------------------------------------------- read --- */ +size_t luaZ_read (ZIO *z, void *b, size_t n) { + while (n) { + size_t m; + if (z->n == 0) { /* no bytes in buffer? */ + if (luaZ_fill(z) == EOZ) /* try to read more */ + return n; /* no more input; return number of missing bytes */ + else { + z->n++; /* luaZ_fill consumed first byte; put it back */ + z->p--; + } + } + m = (n <= z->n) ? n : z->n; /* min. between n and z->n */ + memcpy(b, z->p, m); + z->n -= m; + z->p += m; + b = (char *)b + m; + n -= m; + } + return 0; +} + +/* ------------------------------------------------------------------------ */ +char *luaZ_openspace (lua_State *L, Mbuffer *buff, size_t n) { + if (n > buff->buffsize) { + if (n < LUA_MINBUFFER) n = LUA_MINBUFFER; + luaZ_resizebuffer(L, buff, n); + } + return buff->buffer; +} + + diff --git a/filters/csv b/filters/csv new file mode 100755 index 000000000..626916b0d --- /dev/null +++ b/filters/csv @@ -0,0 +1,113 @@ +#!/usr/bin/perl -w + +use strict; +use warnings; + +my $FILTERTYPE = 'csv'; + +my $SEP = ','; +my $NL = "\n"; + +if ($#ARGV < 1) { + die "Filter failed! Please report bug.\n"; +} + +my $filename = $ARGV[0]; +my $fileType = $ARGV[1]; +my $infile = $filename; + +open INFILE,"< $filename"; +$filename =~ s/\.tmp/\.$FILTERTYPE/; +open OUTFILE,"> $filename"; + +if ($fileType eq 'topology') { + my $region = 'topo'; + print OUTFILE 'THREADS'.$NL; + + while () { + + if (/Cache Topology/) { + $region = 'cache'; + print OUTFILE 'CACHES'.$NL; + } elsif (/NUMA Topology/) { + $region = 'numa'; + print OUTFILE 'NUMA'.$NL; + } + + if ($region eq 'topo') { + if (/(CPU type):\t(.*)/) { + print OUTFILE $1.$SEP.$2.$NL; + } + elsif (/([A-Za-z ]*):\t([0-9]*)/) { + print OUTFILE $1.$SEP.$2.$NL; + } elsif (/(HWThread)\t(Thread)\t\t(Core)\t\t(Socket)/) { + print OUTFILE $1.$SEP.$2.$SEP.$3.$SEP.$4.$NL; + } elsif (/([0-9]*)\t\t([0-9]*)\t\t([0-9]*)\t\t([0-9]*)/) { + print OUTFILE $1.$SEP.$2.$SEP.$3.$SEP.$4.$NL; + } + } elsif ($region eq 'cache') { + if (/(Size):\t([0-9]*) ([kMB]*)/) { + my $size = $2; + if ($3 eq 'MB') { + $size *= 1024; + } + print OUTFILE $1.'[kB]'.$SEP.$size.$NL; + } elsif (/(Cache groups):\t*(.*)/) { + my @groups = split('\) \(',$2); + + my $grpId = 0; + foreach (@groups) { + /([0-9 ]+)/; + print OUTFILE 'Cache group '.$grpId.$SEP.$1.$NL; + $grpId++; + } + } elsif (/(.*):\t*(.*)/) { + print OUTFILE $1.$SEP.$2.$NL; + } + } elsif ($region eq 'numa') { + if (/Domain ([0-9]*)/) { + print OUTFILE 'Domain ID'.$SEP.$1.$NL; + } elsif (/Memory:.*total ([0-9.]+) MB/) { + print OUTFILE 'Memory [MB]'.$SEP.$1.$NL; + } elsif (/(.*):\t*[ ]*(.*)/) { + print OUTFILE $1.$SEP.$2.$NL; + } + } + } +} elsif ($fileType eq 'perfctr') { + my $header = 0; + while () { + if (/Event[ ]*\|[ ]*(core.*)\|/) { + if (not $header) { + my @col = split('\|',$1); + my $numcol = $#col+1; + print OUTFILE 'NumColumns'.$SEP.$numcol.$NL; + print OUTFILE 'Event/Metric'; + foreach (@col) { + s/[ ]//g; + print OUTFILE $SEP.$_; + } + print OUTFILE $NL; + $header = 1; + } + }elsif (/STAT/) { + + }elsif (/\|[ ]+([A-Z0-9_]+)[ ]+\|[ ]*(.*)\|/) { + my @col = split('\|',$2); + print OUTFILE $1; + foreach (@col) { + s/[ ]//g; + print OUTFILE $SEP.$_; + } + print OUTFILE $NL; + } + } +} else { + die "Filter failed! Unknown application type $fileType!\n"; +} + +unlink($infile); +close INFILE; +close OUTFILE; + + diff --git a/filters/template b/filters/template new file mode 100755 index 000000000..290ebfcf3 --- /dev/null +++ b/filters/template @@ -0,0 +1,30 @@ +#!/usr/bin/perl -w + +use strict; +use warnings; + +my $FILTERTYPE = 'csv'; + +if ($#ARGV < 2) { + die "Filter failed! Please report bug.\n"; +} + +my $filename = $ARGV[0]; +my $fileType = $ARGV[1]; + +open INFILE,"< $filename"; +$filename =~ s/\.tmp/\.$FILTERTYPE/; +open OUTFILE,"> $filename"; + + +if ($fileType eq 'topology') { + + +} elsif ($fileType eq 'perfctr') { + + +} else { + die "Filter failed! Unknown application type $fileType!\n"; +} + + diff --git a/filters/xml b/filters/xml new file mode 100755 index 000000000..23eaf8ec9 --- /dev/null +++ b/filters/xml @@ -0,0 +1,123 @@ +#!/usr/bin/perl -w + +use strict; +use warnings; + +my $FILTERTYPE = 'xml'; + +my $NL = "\n"; + +if ($#ARGV < 1) { + die "Filter failed! Please report bug.\n"; +} + +my $filename = $ARGV[0]; +my $fileType = $ARGV[1]; +my $infile = $filename; + +open INFILE,"< $filename"; +$filename =~ s/\.tmp/\.$FILTERTYPE/; +open OUTFILE,"> $filename"; + + +if ($fileType eq 'topology') { + my $region = 'topo'; + print OUTFILE ''.$NL; + + while () { + + if (/Cache Topology/) { + $region = 'cache'; + print OUTFILE ''.$NL; + } elsif (/NUMA Topology/) { + print OUTFILE ''.$NL; + print OUTFILE ''.$NL; + $region = 'numa'; + } + + if ($region eq 'topo') { + if (/(CPU type):\t(.*)/) { + print OUTFILE ''.$2.''.$NL; + } elsif (/(Sockets):\t(.*)/) { + print OUTFILE ''.$2.''.$NL; + } elsif (/(Cores per socket):\t(.*)/) { + print OUTFILE ''.$2.''.$NL; + } elsif (/(Threads per core):\t(.*)/) { + print OUTFILE ''.$2.''.$NL; + } elsif (/([0-9]*)\t\t([0-9]*)\t\t([0-9]*)\t\t([0-9]*)/) { + #TODO Build tree for XML output from table! + } + } elsif ($region eq 'cache') { + if (/(Size):\t([0-9]*) ([kMB]*)/) { + my $size = $2; + if ($3 eq 'MB') { + $size *= 1024; + } + print OUTFILE ''.$size.''.$NL; + } elsif (/(Cache groups):\t*(.*)/) { + print OUTFILE ''.$NL; + } elsif (/(Associativity):\t*(.*)/) { + print OUTFILE ''.$2.''.$NL; + } elsif (/(Number of sets):\t*(.*)/) { + print OUTFILE ''.$2.''.$NL; + } elsif (/(Cache line size):\t*(.*)/) { + print OUTFILE ''.$2.''.$NL; + } elsif (/(Level):\t*(.*)/) { + print OUTFILE ''.$NL; + print OUTFILE ''.$2.''.$NL; + } + } elsif ($region eq 'numa') { + if (/Domain ([0-9]*)/) { + print OUTFILE ''.$NL; + print OUTFILE ''.$1.''.$NL; + } elsif (/Memory:.*total ([0-9.]+) MB/) { + print OUTFILE ''.$1.''.$NL; + } elsif (/Processors:[ ]+([0-9. ]+)/) { + print OUTFILE ''.$1.''.$NL; + } + } + } + + print OUTFILE ''.$NL; + print OUTFILE ''.$NL; +} elsif ($fileType eq 'perfctr') { + my $header = 0; + my @col; + print OUTFILE ''.$NL; + while () { + if (/Event[ ]*\|[ ]*(core.*)\|/) { + if (not $header) { + @col = split('\|',$1); + foreach (@col) { + s/core //g; + s/[ ]//g; + } + $header = 1; + } + }elsif (/STAT/) { + + }elsif (/\|[ ]+([A-Z0-9_]+)[ ]+\|[ ]*(.*)\|/) { + my @rescol = split('\|',$2); + my $id = 0; + print OUTFILE ''.$NL; + print OUTFILE ''.$1.''.$NL; + foreach (@rescol) { + s/[ ]//g; + print OUTFILE ''.$NL; + print OUTFILE ''.$col[$id].''.$NL; + print OUTFILE ''.$_.''.$NL; + print OUTFILE ''.$NL; + $id++; + } + print OUTFILE ''.$NL; + } + } + print OUTFILE ''.$NL; +} else { + die "Filter failed! Unknown application type $fileType!\n"; +} + +unlink($infile); +close INFILE; +close OUTFILE; + diff --git a/groups/atom/BRANCH.txt b/groups/atom/BRANCH.txt new file mode 100644 index 000000000..51d2ddd2d --- /dev/null +++ b/groups/atom/BRANCH.txt @@ -0,0 +1,19 @@ +SHORT Branch prediction miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 BR_INST_RETIRED_ANY +PMC1 BR_INST_RETIRED_MISPRED + +METRICS +Runtime [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +Branch rate PMC0/FIXC0 +Branch misprediction rate PMC1/FIXC0 +Branch misprediction ratio PMC1/PMC0 +Instructions per branch FIXC0/PMC0 + +LONG +Bla Bla + diff --git a/groups/atom/DATA.txt b/groups/atom/DATA.txt new file mode 100644 index 000000000..1c0f4ae73 --- /dev/null +++ b/groups/atom/DATA.txt @@ -0,0 +1,16 @@ +SHORT Load to store ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 L1D_CACHE_LD +PMC1 L1D_CACHE_ST + +METRICS +Runtime [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +Load to Store ratio PMC0/PMC1 + +LONG +Bla Bla + diff --git a/groups/atom/FLOPS_DP.txt b/groups/atom/FLOPS_DP.txt new file mode 100644 index 000000000..12905c6e4 --- /dev/null +++ b/groups/atom/FLOPS_DP.txt @@ -0,0 +1,17 @@ +SHORT Double Precision MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 SIMD_COMP_INST_RETIRED_PACKED_DOUBLE +PMC1 SIMD_COMP_INST_RETIRED_SCALAR_DOUBLE + +METRICS +Runtime [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +DP MFlops/s 1.0E-06*(PMC0*2.0+PMC1)/time + + +LONG +Double Precision MFlops/s Double Precision MFlops/s + diff --git a/groups/atom/FLOPS_SP.txt b/groups/atom/FLOPS_SP.txt new file mode 100644 index 000000000..f064f38ff --- /dev/null +++ b/groups/atom/FLOPS_SP.txt @@ -0,0 +1,16 @@ +SHORT Single Precision MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 SIMD_COMP_INST_RETIRED_PACKED_SINGLE +PMC1 SIMD_COMP_INST_RETIRED_SCALAR_SINGLE + +METRICS +Runtime [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +SP MFlops/s (SP assumed) 1.0E-06*(PMC0*4.0+PMC1)/time + +LONG +Single Precision MFlops/s Double Precision MFlops/s + diff --git a/groups/atom/FLOPS_X87.txt b/groups/atom/FLOPS_X87.txt new file mode 100644 index 000000000..ad14a4d8e --- /dev/null +++ b/groups/atom/FLOPS_X87.txt @@ -0,0 +1,15 @@ +SHORT X87 MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 X87_COMP_OPS_EXE_ANY_AR + +METRICS +Runtime [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +X87 MFlops/s 1.0E-06*PMC0/time + +LONG +X87 MFlops/s + diff --git a/groups/atom/MEM.txt b/groups/atom/MEM.txt new file mode 100644 index 000000000..faf9a0af3 --- /dev/null +++ b/groups/atom/MEM.txt @@ -0,0 +1,15 @@ +SHORT Main memory bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 BUS_TRANS_MEM_THIS_CORE_THIS_A + +METRICS +Runtime [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +Memory bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time + +LONG +Bla Bla + diff --git a/groups/atom/TLB.txt b/groups/atom/TLB.txt new file mode 100644 index 000000000..d36b41357 --- /dev/null +++ b/groups/atom/TLB.txt @@ -0,0 +1,15 @@ +SHORT TLB miss rate + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 DATA_TLB_MISSES_DTLB_MISS + +METRICS +Runtime [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +DTLB miss rate PMC0/FIXC0 + +LONG +Bla Bla + diff --git a/groups/core2/BRANCH.txt b/groups/core2/BRANCH.txt new file mode 100644 index 000000000..15a9ae033 --- /dev/null +++ b/groups/core2/BRANCH.txt @@ -0,0 +1,28 @@ +SHORT Branch prediction miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 BR_INST_RETIRED_ANY +PMC1 BR_INST_RETIRED_MISPRED + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +Branch rate PMC0/FIXC0 +Branch misprediction rate PMC1/FIXC0 +Branch misprediction ratio PMC1/PMC0 +Instructions per branch FIXC0/PMC0 + +LONG +Formulas: +Branch rate = BR_INST_RETIRED_ANY / INSTR_RETIRED_ANY +Branch misprediction rate = BR_INST_RETIRED_MISPRED / INSTR_RETIRED_ANY +Branch misprediction ratio = BR_INST_RETIRED_MISPRED / BR_INST_RETIRED_ANY +Instructions per branch = INSTR_RETIRED_ANY / BR_INST_RETIRED_ANY +- +The rates state how often in average a branch or a mispredicted branch occured +per instruction retired in total. The Branch misprediction ratio sets directly +into relation what ration of all branch instruction where mispredicted. +Instructions per branch is 1/Branch rate. diff --git a/groups/core2/CACHE.txt b/groups/core2/CACHE.txt new file mode 100644 index 000000000..26e310ccb --- /dev/null +++ b/groups/core2/CACHE.txt @@ -0,0 +1,33 @@ +SHORT Data cache miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 L1D_REPL +PMC1 L1D_ALL_CACHE_REF + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +Data cache misses PMC0 +Data cache request rate PMC1/FIXC0 +Data cache miss rate PMC0/FIXC0 +Data cache miss ratio PMC0/PMC1 + +LONG +Formulas: +Data cache request rate = L1D_ALL_CACHE_REF / INSTR_RETIRED_ANY +Data cache miss rate = L1D_REPL / INSTR_RETIRED_ANY +Data cache miss ratio = L1D_REPL / L1D_ALL_CACHE_REF +- +This group measures the locality of your data accesses with regard to the +L1 Cache. Data cache request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The Data cache miss rate gives a measure how often it was necessary to get +cachelines from higher levels of the memory hierarchy. And finally +Data cache miss ratio tells you how many of your memory references required +a cacheline to be loaded from a higher level. While the Data cache miss rate +might be given by your algorithm you should try to get Data cache miss ratio +as low as possible by increasing your cache reuse. + diff --git a/groups/core2/DATA.txt b/groups/core2/DATA.txt new file mode 100644 index 000000000..af77c1e1e --- /dev/null +++ b/groups/core2/DATA.txt @@ -0,0 +1,20 @@ +SHORT Load to store ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 INST_RETIRED_LOADS +PMC1 INST_RETIRED_STORES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +Load to Store ratio PMC0/PMC1 + +LONG +Formulas: +Load to Store ratio = INST_RETIRED_LOADS / INST_RETIRED_STORES +- +This is a simple metric to determine your Load to store ratio. + diff --git a/groups/core2/FLOPS_DP.txt b/groups/core2/FLOPS_DP.txt new file mode 100644 index 000000000..81e30b378 --- /dev/null +++ b/groups/core2/FLOPS_DP.txt @@ -0,0 +1,22 @@ +SHORT Double Precision MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 SIMD_COMP_INST_RETIRED_PACKED_DOUBLE +PMC1 SIMD_COMP_INST_RETIRED_SCALAR_DOUBLE + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +DP MFlops/s 1.0E-06*(PMC0*2.0+PMC1)/time + +LONG +Formulas: +DP MFlops/s = 1.0E-06*(SIMD_COMP_INST_RETIRED_PACKED_DOUBLE*2+SIMD_COMP_INST_RETIRED_SCALAR_DOUBLE)/time +- +Profiling group to measure double SSE flops. Dont forget that your code might also execute X87 flops. +On the number of SIMD_COMP_INST_RETIRED_PACKED_DOUBLE you can see how well your code was vectorized. + + diff --git a/groups/core2/FLOPS_SP.txt b/groups/core2/FLOPS_SP.txt new file mode 100644 index 000000000..92c95bbae --- /dev/null +++ b/groups/core2/FLOPS_SP.txt @@ -0,0 +1,22 @@ +SHORT Single Precision MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 SIMD_COMP_INST_RETIRED_PACKED_SINGLE +PMC1 SIMD_COMP_INST_RETIRED_SCALAR_SINGLE + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +SP MFlops/s 1.0E-06*(PMC0*4.0+PMC1)/time + +LONG +Formulas: +SP MFlops/s = 1.0E-06*(SIMD_COMP_INST_RETIRED_PACKED_SINGLE*4+SIMD_COMP_INST_RETIRED_SCALAR_SINGLE)/time +- +Profiling group to measure single precision SSE flops. Dont forget that your code might also execute X87 flops. +On the number of SIMD_COMP_INST_RETIRED_PACKED_SINGLE you can see how well your code was vectorized. + + diff --git a/groups/core2/FLOPS_X87.txt b/groups/core2/FLOPS_X87.txt new file mode 100644 index 000000000..1bcd4d6eb --- /dev/null +++ b/groups/core2/FLOPS_X87.txt @@ -0,0 +1,20 @@ +SHORT X87 MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 X87_OPS_RETIRED_ANY + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +X87 MFlops/s 1.0E-06*PMC0/time + +LONG +Formulas: +X87 MFlops/s = 1.0E-06*X87_OPS_RETIRED_ANY/time +- +Profiling group to measure X87 flops. Note that also non computational operations +are measured by this event. + diff --git a/groups/core2/L2.txt b/groups/core2/L2.txt new file mode 100644 index 000000000..8436400d3 --- /dev/null +++ b/groups/core2/L2.txt @@ -0,0 +1,30 @@ +SHORT L2 cache bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 L1D_REPL +PMC1 L1D_M_EVICT + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +L2 load [MBytes/s] 1.0E-06*PMC0*64.0/time +L2 evict [MBytes/s] 1.0E-06*PMC1*64.0/time +L2 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L2 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 + +LONG +Formulas: +L2 Load [MBytes/s] = 1.0E-06*L1D_REPL*64/time +L2 Evict [MBytes/s] = 1.0E-06*L1D_M_EVICT*64/time +L2 bandwidth [MBytes/s] = 1.0E-06*(L1D_REPL+L1D_M_EVICT)*64/time +L2 data volume [GBytes] 1.0E-09*(L1D_REPL+L1D_M_EVICT)*64.0 +- +Profiling group to measure L2 cache bandwidth. The bandwidth is +computed by the number of cacheline allocated in the L1 and the +number of modified cachelines evicted from the L1. +Note that this bandwidth also includes data transfers due to a +write allocate load on a store miss in L1. + diff --git a/groups/core2/L2CACHE.txt b/groups/core2/L2CACHE.txt new file mode 100644 index 000000000..dbbed5d83 --- /dev/null +++ b/groups/core2/L2CACHE.txt @@ -0,0 +1,33 @@ +SHORT L2 cache miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 L2_RQSTS_THIS_CORE_ALL_MESI +PMC1 L2_RQSTS_SELF_I_STATE + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +L2 request rate PMC0/FIXC0 +L2 miss rate PMC1/FIXC0 +L2 miss ratio PMC1/PMC0 + +LONG +Formulas: +L2 request rate = L2_RQSTS_THIS_CORE_ALL_MESI / INSTR_RETIRED_ANY +L2 miss rate = L2_RQSTS_SELF_I_STATE / INSTR_RETIRED_ANY +L2 miss ratio = L2_RQSTS_SELF_I_STATE / L2_RQSTS_THIS_CORE_ALL_MESI +- +This group measures the locality of your data accesses with regard to the +L2 Cache. L2 request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The L2 miss rate gives a measure how often it was necessary to get +cachelines from memory. And finally L2 miss ratio tells you how many of your +memory references required a cacheline to be loaded from a higher level. +While the Data cache miss rate might be given by your algorithm you should +try to get Data cache miss ratio as low as possible by increasing your cache reuse. +Note: This group might need to be revised! + + diff --git a/groups/core2/MEM.txt b/groups/core2/MEM.txt new file mode 100644 index 000000000..8f193d66a --- /dev/null +++ b/groups/core2/MEM.txt @@ -0,0 +1,20 @@ +SHORT Main memory bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 BUS_TRANS_MEM_THIS_CORE_THIS_A + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +Memory bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time +Memory data volume [GBytes] 1.0E-09*PMC0*64.0 + +LONG +Formulas: +Memory bandwidth [MBytes/s] = 1.0E-06*BUS_TRANS_MEM_THIS_CORE_THIS_A*64/time +Memory data volume [GBytes] 1.0E-09*BUS_TRANS_MEM_THIS_CORE_THIS_A*64.0 +- +Profiling group to measure memory bandwidth drawn by this core. diff --git a/groups/core2/TLB.txt b/groups/core2/TLB.txt new file mode 100644 index 000000000..f36abfe15 --- /dev/null +++ b/groups/core2/TLB.txt @@ -0,0 +1,28 @@ +SHORT TLB miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +PMC0 DTLB_MISSES_ANY +PMC1 L1D_ALL_CACHE_REF + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +CPI FIXC1/FIXC0 +L1 DTLB request rate PMC1/FIXC0 +DTLB miss rate PMC0/FIXC0 +L1 DTLB miss ratio PMC0/PMC1 + +LONG +Formulas: +L1 DTLB request rate = L1D_ALL_CACHE_REF / INSTR_RETIRED_ANY +DTLB miss rate = DTLB_MISSES_ANY / INSTR_RETIRED_ANY +L1 DTLB miss ratio = DTLB_MISSES_ANY / L1D_ALL_CACHE_REF +- +L1 DTLB request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The DTLB miss rate gives a measure how often a TLB miss occured +per instruction. And finally L1 DTLB miss ratio tells you how many +of your memory references required caused a TLB miss in average. + diff --git a/groups/haswell/BRANCH.txt b/groups/haswell/BRANCH.txt new file mode 100644 index 000000000..cbaf83451 --- /dev/null +++ b/groups/haswell/BRANCH.txt @@ -0,0 +1,31 @@ +SHORT Branch prediction miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 BR_INST_RETIRED_ALL_BRANCHES +PMC1 BR_MISP_RETIRED_ALL_BRANCHES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Branch rate PMC0/FIXC0 +Branch misprediction rate PMC1/FIXC0 +Branch misprediction ratio PMC1/PMC0 +Instructions per branch FIXC0/PMC0 + +LONG +Formulas: +Branch rate = BR_INST_RETIRED_ALL_BRANCHES / INSTR_RETIRED_ANY +Branch misprediction rate = BR_MISP_RETIRED_ALL_BRANCHES / INSTR_RETIRED_ANY +Branch misprediction ratio = BR_MISP_RETIRED_ALL_BRANCHES / BR_INST_RETIRED_ALL_BRANCHES +Instructions per branch = INSTR_RETIRED_ANY / BR_INST_RETIRED_ALL_BRANCHES +- +The rates state how often in average a branch or a mispredicted branch occured +per instruction retired in total. The Branch misprediction ratio sets directly +into relation what ratio of all branch instruction where mispredicted. +Instructions per branch is 1/Branch rate. + diff --git a/groups/haswell/DATA.txt b/groups/haswell/DATA.txt new file mode 100644 index 000000000..5f04a23a8 --- /dev/null +++ b/groups/haswell/DATA.txt @@ -0,0 +1,22 @@ +SHORT Load to store ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 MEM_UOP_RETIRED_LOADS +PMC1 MEM_UOP_RETIRED_STORES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Load to Store ratio PMC0/PMC1 + +LONG +Formulas: +Load to Store ratio = MEM_UOP_RETIRED_LOADS / MEM_UOP_RETIRED_STORES +- +This is a metric to determine your load to store ratio. + diff --git a/groups/haswell/ENERGY.txt b/groups/haswell/ENERGY.txt new file mode 100644 index 000000000..276cf165c --- /dev/null +++ b/groups/haswell/ENERGY.txt @@ -0,0 +1,23 @@ +SHORT Power and Energy consumption + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PWR0 PWR_PKG_ENERGY + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Energy [J] PWR0 +Power [W] PWR0/time + +LONG +Formula: +Power = PWR_PKG_ENERGY / time +- +Haswell implements the new RAPL interface. This interface enables to +monitor the consumed energy on the package (socket) level. + diff --git a/groups/haswell/L2CACHE.txt b/groups/haswell/L2CACHE.txt new file mode 100644 index 000000000..3d7c36ea1 --- /dev/null +++ b/groups/haswell/L2CACHE.txt @@ -0,0 +1,35 @@ +SHORT L2 cache miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L2_TRANS_ALL_REQUESTS +PMC1 L2_RQSTS_MISS + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L2 request rate PMC0/FIXC0 +L2 miss rate PMC1/FIXC0 +L2 miss ratio PMC1/PMC0 + +LONG +Formulas: +L2 request rate = L2_TRANS_ALL_REQUESTS / INSTR_RETIRED_ANY +L2 miss rate = L2_RQSTS_MISS / INSTR_RETIRED_ANY +L2 miss ratio = L2_RQSTS_MISS / L2_TRANS_ALL_REQUESTS +- +This group measures the locality of your data accesses with regard to the +L2 Cache. L2 request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The L2 miss rate gives a measure how often it was necessary to get +cachelines from memory. And finally L2 miss ratio tells you how many of your +memory references required a cacheline to be loaded from a higher level. +While the Data cache miss rate might be given by your algorithm you should +try to get Data cache miss ratio as low as possible by increasing your cache reuse. +Note: This group might need to be revised! + + diff --git a/groups/haswell/L3.txt b/groups/haswell/L3.txt new file mode 100644 index 000000000..42d6e4a1d --- /dev/null +++ b/groups/haswell/L3.txt @@ -0,0 +1,32 @@ +SHORT L3 cache bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L2_LINES_IN_ALL +PMC1 L2_LINES_OUT_DEMAND_DIRTY + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L3 Load [MBytes/s] 1.0E-06*PMC0*64.0/time +L3 Evict [MBytes/s] 1.0E-06*PMC1*64.0/time +L3 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L3 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 + +LONG +Formulas: +L3 Load [MBytes/s] 1.0E-06*L2_LINES_IN_ALL*64/time +L3 Evict [MBytes/s] 1.0E-06*L2_LINES_OUT_DEMAND_DIRTY*64/time +L3 bandwidth [MBytes/s] 1.0E-06*(L2_LINES_IN_ALL+L2_LINES_OUT_DEMAND_DIRTY)*64/time +L3 data volume [GBytes] 1.0E-09*(L2_LINES_IN_ALL+L2_LINES_OUT_DEMAND_DIRTY)*64 +- +Profiling group to measure L3 cache bandwidth. The bandwidth is computed by the +number of cacheline allocated in the L2 and the number of modified cachelines +evicted from the L2. This group also outputs data volume transfered between the +L3 and measured cores L2 caches. Note that this bandwidth also includes data +transfers due to a write allocate load on a store miss in L2. + diff --git a/groups/haswell/TLB.txt b/groups/haswell/TLB.txt new file mode 100644 index 000000000..78bf096bb --- /dev/null +++ b/groups/haswell/TLB.txt @@ -0,0 +1,22 @@ +SHORT TLB miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 DTLB_LOAD_MISSES_CAUSES_A_WALK + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L1 DTLB miss rate PMC0/FIXC0 + +LONG +Formulas: +DTLB miss rate LOAD_MISSES_CAUSES_A_WALK / INSTR_RETIRED_ANY +- +The DTLB miss rate gives a measure how often a TLB miss occured +per instruction. + diff --git a/groups/interlagos/BRANCH.txt b/groups/interlagos/BRANCH.txt new file mode 100644 index 000000000..1ae9f36a4 --- /dev/null +++ b/groups/interlagos/BRANCH.txt @@ -0,0 +1,32 @@ +SHORT Branch prediction miss rate/ratio + +EVENTSET +PMC0 RETIRED_INSTRUCTIONS +PMC1 RETIRED_BRANCH_INSTR +PMC2 RETIRED_MISPREDICTED_BRANCH_INSTR +PMC3 RETIRED_TAKEN_BRANCH_INSTR + +METRICS +Runtime (RDTSC) [s] time +Branch rate PMC1/PMC0 +Branch misprediction rate PMC2/PMC0 +Branch misprediction ratio PMC2/PMC1 +Branch taken rate PMC3/PMC0 +Branch taken ratio PMC3/PMC1 +Instructions per branch PMC0/PMC1 + +LONG +Formulas: +Branch rate = RETIRED_BRANCH_INSTR / RETIRED_INSTRUCTIONS +Branch misprediction rate = RETIRED_MISPREDICTED_BRANCH_INSTR / RETIRED_INSTRUCTIONS +Branch misprediction ratio = RETIRED_MISPREDICTED_BRANCH_INSTR / RETIRED_BRANCH_INSTR +Branch taken rate = RETIRED_TAKEN_BRANCH_INSTR / RETIRED_INSTRUCTIONS +Branch taken ratio = RETIRED_TAKEN_BRANCH_INSTR / RETIRED_BRANCH_INSTR +Instructions per branch = RETIRED_INSTRUCTIONS / RETIRED_BRANCH_INSTR +- +The rates state how often in average a branch or a mispredicted branch occured +per instruction retired in total. The branch misprediction ratio sets directly +into relation what ratio of all branch instruction where mispredicted. +Instructions per branch is 1/Branch rate. The same applies for the branches +taken metrics. + diff --git a/groups/interlagos/CACHE.txt b/groups/interlagos/CACHE.txt new file mode 100644 index 000000000..23343a56a --- /dev/null +++ b/groups/interlagos/CACHE.txt @@ -0,0 +1,32 @@ +SHORT Data cache miss rate/ratio + +EVENTSET +PMC0 RETIRED_INSTRUCTIONS +PMC1 DATA_CACHE_ACCESSES +PMC2 DATA_CACHE_REFILLS_VALID +PMC3 DATA_CACHE_MISSES_ALL + +METRICS +Runtime (RDTSC) [s] time +Data cache misses PMC3 +Data cache request rate PMC1/PMC0 +Data cache miss rate (PMC2)/PMC0 +Data cache miss ratio (PMC2)/PMC1 + +LONG +Formulas: +Data cache misses = DATA_CACHE_MISSES_ALL +Data cache request rate = DATA_CACHE_ACCESSES / RETIRED_INSTRUCTIONS +Data cache miss rate = (DATA_CACHE_REFILLS_VALID) / RETIRED_INSTRUCTIONS +Data cache miss ratio = (DATA_CACHE_REFILLS_VALID)/DATA_CACHE_ACCESSES +- +This group measures the locality of your data accesses with regard to the +L1 Cache. Data cache request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The Data cache miss rate gives a measure how often it was necessary to get +cachelines from higher levels of the memory hierarchy. And finally +Data cache miss ratio tells you how many of your memory references required +a cacheline to be loaded from a higher level. While the Data cache miss rate +might be given by your algorithm you should try to get Data cache miss ratio +as low as possible by increasing your cache reuse. + diff --git a/groups/interlagos/CPI.txt b/groups/interlagos/CPI.txt new file mode 100644 index 000000000..47711f45b --- /dev/null +++ b/groups/interlagos/CPI.txt @@ -0,0 +1,21 @@ +SHORT Cycles per instruction + +EVENTSET +PMC0 RETIRED_INSTRUCTIONS +PMC1 CPU_CLOCKS_UNHALTED +PMC2 RETIRED_UOPS + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] PMC1*inverseClock +CPI PMC1/PMC0 +CPI (based on uops) PMC1/PMC2 +IPC PMC0/PMC1 + +LONG +This group measures how efficient the processor works with +regard to instruction throughput. Also important as a standalone +metric is RETIRED_INSTRUCTIONS as it tells you how many instruction +you need to execute for a task. An optimization might show very +low CPI values but execute many more instruction for it. + diff --git a/groups/interlagos/DATA.txt b/groups/interlagos/DATA.txt new file mode 100644 index 000000000..78e4c3c81 --- /dev/null +++ b/groups/interlagos/DATA.txt @@ -0,0 +1,16 @@ +SHORT Load to store ratio + +EVENTSET +PMC0 LS_DISPATCH_LOADS +PMC1 LS_DISPATCH_STORES + +METRICS +Runtime (RDTSC) [s] time +Load to Store ratio PMC0/PMC1 + +LONG +Formulas: +Load to Store ratio = LS_DISPATCH_LOADS / LS_DISPATCH_STORES +- +This is a simple metric to determine your load to store ratio. + diff --git a/groups/interlagos/FLOPS_DP.txt b/groups/interlagos/FLOPS_DP.txt new file mode 100644 index 000000000..d7f5f57cb --- /dev/null +++ b/groups/interlagos/FLOPS_DP.txt @@ -0,0 +1,23 @@ +SHORT Double Precision MFlops/s + +EVENTSET +PMC0 RETIRED_INSTRUCTIONS +PMC1 CPU_CLOCKS_UNHALTED +PMC2 RETIRED_UOPS +PMC3 RETIRED_FLOPS_DOUBLE_ALL + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] PMC1*inverseClock +MFlops/s 1.0E-06*(PMC3)/time +CPI PMC1/PMC0 +CPI (based on uops) PMC1/PMC2 +IPC PMC0/PMC1 + +LONG +Formulas: +DP MFlops/s = 1.0E-06*(RETIRED_FLOPS_DOUBLE_ALL)/time +- +Profiling group to measure double precisision flop rate. + + diff --git a/groups/interlagos/FLOPS_SP.txt b/groups/interlagos/FLOPS_SP.txt new file mode 100644 index 000000000..1c4dcc371 --- /dev/null +++ b/groups/interlagos/FLOPS_SP.txt @@ -0,0 +1,23 @@ +SHORT Single Precision MFlops/s + +EVENTSET +PMC0 RETIRED_INSTRUCTIONS +PMC1 CPU_CLOCKS_UNHALTED +PMC2 RETIRED_UOPS +PMC3 RETIRED_FLOPS_SINGLE_ALL + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] PMC1*inverseClock +MFlops/s 1.0E-06*(PMC3)/time +CPI PMC1/PMC0 +CPI (based on uops) PMC1/PMC2 +IPC PMC0/PMC1 + +LONG +Formulas: +SP MFlops/s = 1.0E-06*(RETIRED_FLOPS_SINGLE_ALL)/time +- +Profiling group to measure single precision flop rate. + + diff --git a/groups/interlagos/FPU_EXCEPTION.txt b/groups/interlagos/FPU_EXCEPTION.txt new file mode 100644 index 000000000..5c586e454 --- /dev/null +++ b/groups/interlagos/FPU_EXCEPTION.txt @@ -0,0 +1,21 @@ +SHORT Floating point exceptions + +EVENTSET +PMC0 RETIRED_INSTRUCTIONS +PMC1 RETIRED_FP_INSTRUCTIONS_ALL +PMC2 FPU_EXCEPTION_ALL + +METRICS +Runtime (RDTSC) [s] time +Overall FP exception rate PMC2/PMC0 +FP exception rate PMC2/PMC1 + +LONG +Formulas: +Overall FP exception rate = FPU_EXCEPTIONS_ALL / INSTRUCTIONS_RETIRED +FP exception rate = FPU_EXCEPTIONS_ALL / FP_INSTRUCTIONS_RETIRED_ALL +- +Floating point exceptions occur e.g. on the treatment of Denormals. +There might be a large penalty if there are too many floating point +exceptions. + diff --git a/groups/interlagos/ICACHE.txt b/groups/interlagos/ICACHE.txt new file mode 100644 index 000000000..be5e5f591 --- /dev/null +++ b/groups/interlagos/ICACHE.txt @@ -0,0 +1,25 @@ +SHORT Instruction cache miss rate/ratio + +EVENTSET +PMC0 INSTRUCTION_CACHE_FETCHES +PMC1 INSTRUCTION_CACHE_L2_REFILLS +PMC2 INSTRUCTION_CACHE_SYSTEM_REFILLS +PMC3 RETIRED_INSTRUCTIONS + +METRICS +Runtime (RDTSC) [s] time +Instruction cache misses PMC1+PMC2 +Instruction cache request rate PMC0/PMC3 +Instruction cache miss rate (PMC1+PMC2)/PMC3 +Instruction cache miss ratio (PMC1+PMC2)/PMC0 + +LONG +Formulas: +Instruction cache misses INSTRUCTION_CACHE_L2_REFILLS + INSTRUCTION_CACHE_SYSTEM_REFILLS +Instruction cache request rate INSTRUCTION_CACHE_FETCHES / RETIRED_INSTRUCTIONS +Instruction cache miss rate (INSTRUCTION_CACHE_L2_REFILLS + INSTRUCTION_CACHE_SYSTEM_REFILLS)/RETIRED_INSTRUCTIONS +Instruction cache miss ratio (INSTRUCTION_CACHE_L2_REFILLS + INSTRUCTION_CACHE_SYSTEM_REFILLS)/INSTRUCTION_CACHE_FETCHES +- +This group measures the locality of your instruction code with regard to the +L1 I-Cache. + diff --git a/groups/interlagos/L2.txt b/groups/interlagos/L2.txt new file mode 100644 index 000000000..a1f57149b --- /dev/null +++ b/groups/interlagos/L2.txt @@ -0,0 +1,29 @@ +SHORT L2 cache bandwidth in MBytes/s + +EVENTSET +PMC0 DATA_CACHE_REFILLS_ALL +PMC1 DATA_CACHE_REFILLS_SYSTEM +PMC2 CPU_CLOCKS_UNHALTED + +METRICS +Runtime (RDTSC) [s] time +L2 bandwidth [MBytes/s] 1.0E-06*(PMC0-PMC1)*64.0/time +L2 data volume [GBytes] 1.0E-09*(PMC0-PMC1)*64.0 +Cache refill bandwidth System/L2 [MBytes/s] 1.0E-06*PMC0*64.0/time +Cache refill bandwidth System [MBytes/s] 1.0E-06*PMC1*64.0/time + +LONG +Formulas: +L2 bandwidth [MBytes/s] 1.0E-06*(DATA_CACHE_REFILLS_ALL-DATA_CACHE_REFILLS_SYSTEM)*64/time +L2 data volume [GBytes] 1.0E-09*(DATA_CACHE_REFILLS_ALL-DATA_CACHE_REFILLS_SYSTEM)*64 +Cache refill bandwidth System/L2 [MBytes/s] 1.0E-06*DATA_CACHE_REFILLS_ALL*64/time +Cache refill bandwidth System [MBytes/s] 1.0E-06*DATA_CACHE_REFILLS_SYSTEM*64/time +- +Profiling group to measure L2 cache bandwidth. The bandwidth is +computed by the number of cacheline loaded from L2 to L1 and the +number of modified cachelines evicted from the L1. +Note that this bandwidth also includes data transfers due to a +write allocate load on a store miss in L1 and copy back transfers if +originated from L2. L2-L1 data volume is the total data volume transfered +between L2 and L1. + diff --git a/groups/interlagos/L2CACHE.txt b/groups/interlagos/L2CACHE.txt new file mode 100644 index 000000000..17209e8d8 --- /dev/null +++ b/groups/interlagos/L2CACHE.txt @@ -0,0 +1,31 @@ +SHORT L2 cache miss rate/ratio + +EVENTSET +PMC0 RETIRED_INSTRUCTIONS +PMC1 REQUESTS_TO_L2_DC_FILL +PMC2 L2_CACHE_MISS_DC_FILL + +METRICS +Runtime (RDTSC) [s] time +L2 request rate (PMC1)/PMC0 +L2 miss rate PMC2/PMC0 +L2 miss ratio PMC2/(PMC1) + +LONG +Formulas: +L2 request rate = (L2_REQUESTS_ALL)/INSTRUCTIONS_RETIRED +L2 miss rate = L2_MISSES_ALL/INSTRUCTIONS_RETIRED +L2 miss ratio = L2_MISSES_ALL/(L2_REQUESTS_ALL) +- +This group measures the locality of your data accesses with regard to the L2 +Cache. L2 request rate tells you how data intensive your code is or how many +Data accesses you have in average per instruction. The L2 miss rate gives a +measure how often it was necessary to get cachelines from memory. And finally +L2 miss ratio tells you how many of your memory references required a cacheline +to be loaded from a higher level. While the Data cache miss rate might be +given by your algorithm you should try to get Data cache miss ratio as low as +possible by increasing your cache reuse. This group is inspired from the +whitepaper -Basic Performance Measurements for AMD Athlon 64, AMD Opteron and +AMD Phenom Processors- from Paul J. Drongowski. + + diff --git a/groups/interlagos/L3.txt b/groups/interlagos/L3.txt new file mode 100644 index 000000000..c1a6f1789 --- /dev/null +++ b/groups/interlagos/L3.txt @@ -0,0 +1,24 @@ +SHORT L3 cache bandwidth in MBytes/s + +EVENTSET +PMC0 L2_FILL_WB_FILL +PMC1 L2_FILL_WB_WB +PMC2 CPU_CLOCKS_UNHALTED + +METRICS +Runtime (RDTSC) [s] time +L3 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L3 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 +L3 refill bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time +L3 evict [MBytes/s] 1.0E-06*PMC1*64.0/time + +LONG +Formulas: +L3 bandwidth [MBytes/s] 1.0E-06*(L2_FILL_WB_FILL+L2_FILL_WB_WB)*64/time +L3 data volume [GBytes] 1.0E-09*(L2_FILL_WB_FILL+L2_FILL_WB_WB)*64 +L3 refill bandwidth [MBytes/s] 1.0E-06*L2_FILL_WB_FILL*64/time +- +Profiling group to measure L3 cache bandwidth. The bandwidth is +computed by the number of cacheline loaded from L3 to L2 and the +number of modified cachelines evicted from the L2. + diff --git a/groups/interlagos/L3CACHE.txt b/groups/interlagos/L3CACHE.txt new file mode 100644 index 000000000..4bef1a78c --- /dev/null +++ b/groups/interlagos/L3CACHE.txt @@ -0,0 +1,35 @@ +SHORT L3 cache miss rate/ratio + +EVENTSET +PMC0 RETIRED_INSTRUCTIONS +UPMC0 UNC_READ_REQ_TO_L3_ALL +UPMC1 UNC_L3_CACHE_MISS_ALL +UPMC2 UNC_L3_LATENCY_CYCLE_COUNT +UPMC3 UNC_L3_LATENCY_REQUEST_COUNT + +METRICS +Runtime (RDTSC) [s] time +L3 request rate UPMC0/PMC0 +L3 miss rate UPMC1/PMC0 +L3 miss ratio UPMC1/UPMC0 +L3 average access latency [cycles] UPMC2/UPMC3 + +LONG +Formulas: +L3 request rate = (UNC_READ_REQ_TO_L3_ALL)/INSTRUCTIONS_RETIRED +L3 miss rate = UNC_L3_CACHE_MISS_ALL/INSTRUCTIONS_RETIRED +L3 miss ratio = UNC_L3_CACHE_MISS_ALL/UNC_READ_REQ_TO_L3_ALL +L3 average access latency = UNC_L3_LATENCY_CYCLE_COUNT/UNC_L3_LATENCY_REQUEST_COUNT +- +This group measures the locality of your data accesses with regard to the L3 +Cache. L3 request rate tells you how data intensive your code is or how many +Data accesses you have in average per instruction. The L3 miss rate gives a +measure how often it was necessary to get cachelines from memory. And finally +L3 miss ratio tells you how many of your memory references required a cacheline +to be loaded from a higher level. While the Data cache miss rate might be +given by your algorithm you should try to get Data cache miss ratio as low as +possible by increasing your cache reuse. This group was inspired from the +whitepaper -Basic Performance Measurements for AMD Athlon 64, AMD Opteron and +AMD Phenom Processors- from Paul J. Drongowski. + + diff --git a/groups/interlagos/LINKS.txt b/groups/interlagos/LINKS.txt new file mode 100644 index 000000000..649f0d169 --- /dev/null +++ b/groups/interlagos/LINKS.txt @@ -0,0 +1,26 @@ +SHORT Bandwidth on the Hypertransport links + +EVENTSET +UPMC0 UNC_LINK_TRANSMIT_BW_L0_USE +UPMC1 UNC_LINK_TRANSMIT_BW_L1_USE +UPMC2 UNC_LINK_TRANSMIT_BW_L2_USE +UPMC3 UNC_LINK_TRANSMIT_BW_L3_USE + +METRICS +Runtime (RDTSC) [s] time +Link bandwidth L0 [MBytes/s] 1.0E-06*UPMC0*4.0/time +Link bandwidth L1 [MBytes/s] 1.0E-06*UPMC1*4.0/time +Link bandwidth L2 [MBytes/s] 1.0E-06*UPMC2*4.0/time +Link bandwidth L3 [MBytes/s] 1.0E-06*UPMC3*4.0/time + +LONG +Formulas: +Link bandwidth L0 [MBytes/s] 1.0E-06*UNC_LINK_TRANSMIT_BW_L0_USE*4.0/time +Link bandwidth L1 [MBytes/s] 1.0E-06*UNC_LINK_TRANSMIT_BW_L1_USE*4.0/time +Link bandwidth L2 [MBytes/s] 1.0E-06*UNC_LINK_TRANSMIT_BW_L2_USE*4.0/time +Link bandwidth L3 [MBytes/s] 1.0E-06*UNC_LINK_TRANSMIT_BW_L3_USE*4.0/time +- +Profiling group to measure the Hypertransport link bandwidth for the four links +of a local node. This indicates the data flow between different ccNUMA nodes. + + diff --git a/groups/interlagos/MEM.txt b/groups/interlagos/MEM.txt new file mode 100644 index 000000000..22aa19ef5 --- /dev/null +++ b/groups/interlagos/MEM.txt @@ -0,0 +1,20 @@ +SHORT Main memory bandwidth in MBytes/s + +EVENTSET +UPMC0 UNC_DRAM_ACCESSES_DCT0_ALL +UPMC1 UNC_DRAM_ACCESSES_DCT1_ALL + +METRICS +Runtime (RDTSC) [s] time +Memory bandwidth [MBytes/s] 1.0E-06*(UPMC0+UPMC1)*64.0/time +Memory data volume [GBytes] 1.0E-09*(UPMC0+UPMC1)*64.0 + +LONG +Formulas: +Memory bandwidth [MBytes/s] = 1.0E-06*(DRAM_ACCESSES_DCTO_ALL+DRAM_ACCESSES_DCT1_ALL)*64/time +Memory data volume [GBytes] = 1.0E-09*(DRAM_ACCESSES_DCTO_ALL+DRAM_ACCESSES_DCT1_ALL)*64 +- +Profiling group to measure memory bandwidth drawn by all cores of a socket. +Note: As this group measures the accesses from all cores it only makes sense +to measure with one core per socket, similiar as with the Intel Nehalem Uncore events. + diff --git a/groups/interlagos/NUMA.txt b/groups/interlagos/NUMA.txt new file mode 100644 index 000000000..d94e735dd --- /dev/null +++ b/groups/interlagos/NUMA.txt @@ -0,0 +1,28 @@ +SHORT Read/Write Events between the ccNUMA nodes + +EVENTSET +UPMC0 UNC_CPU_TO_DRAM_LOCAL_TO_0 +UPMC1 UNC_CPU_TO_DRAM_LOCAL_TO_1 +UPMC2 UNC_CPU_TO_DRAM_LOCAL_TO_2 +UPMC3 UNC_CPU_TO_DRAM_LOCAL_TO_3 + +METRICS +Runtime (RDTSC) [s] time +DRAM read/write local to 0 [MegaEvents/s] 1.0E-06*UPMC0/time +DRAM read/write local to 1 [MegaEvents/s] 1.0E-06*UPMC1/time +DRAM read/write local to 2 [MegaEvents/s] 1.0E-06*UPMC2/time +DRAM read/write local to 3 [MegaEvents/s] 1.0E-06*UPMC3/time + +LONG +Formulas: +DRAM read/write local to 0 [MegaEvents/s] 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_0/time +DRAM read/write local to 1 [MegaEvents/s] 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_1/time +DRAM read/write local to 2 [MegaEvents/s] 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_2/time +DRAM read/write local to 3 [MegaEvents/s] 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_3/time +- +Profiling group to measure the traffic from local CPU to the different +DRAM NUMA nodes. This group allows to detect NUMA problems in a threaded +code. You must first determine on which memory domains your code is running. +A code should only have significant traffic to its own memory domain. + + diff --git a/groups/ivybridge/BRANCH.txt b/groups/ivybridge/BRANCH.txt new file mode 100644 index 000000000..cbaf83451 --- /dev/null +++ b/groups/ivybridge/BRANCH.txt @@ -0,0 +1,31 @@ +SHORT Branch prediction miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 BR_INST_RETIRED_ALL_BRANCHES +PMC1 BR_MISP_RETIRED_ALL_BRANCHES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Branch rate PMC0/FIXC0 +Branch misprediction rate PMC1/FIXC0 +Branch misprediction ratio PMC1/PMC0 +Instructions per branch FIXC0/PMC0 + +LONG +Formulas: +Branch rate = BR_INST_RETIRED_ALL_BRANCHES / INSTR_RETIRED_ANY +Branch misprediction rate = BR_MISP_RETIRED_ALL_BRANCHES / INSTR_RETIRED_ANY +Branch misprediction ratio = BR_MISP_RETIRED_ALL_BRANCHES / BR_INST_RETIRED_ALL_BRANCHES +Instructions per branch = INSTR_RETIRED_ANY / BR_INST_RETIRED_ALL_BRANCHES +- +The rates state how often in average a branch or a mispredicted branch occured +per instruction retired in total. The Branch misprediction ratio sets directly +into relation what ratio of all branch instruction where mispredicted. +Instructions per branch is 1/Branch rate. + diff --git a/groups/ivybridge/DATA.txt b/groups/ivybridge/DATA.txt new file mode 100644 index 000000000..5f04a23a8 --- /dev/null +++ b/groups/ivybridge/DATA.txt @@ -0,0 +1,22 @@ +SHORT Load to store ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 MEM_UOP_RETIRED_LOADS +PMC1 MEM_UOP_RETIRED_STORES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Load to Store ratio PMC0/PMC1 + +LONG +Formulas: +Load to Store ratio = MEM_UOP_RETIRED_LOADS / MEM_UOP_RETIRED_STORES +- +This is a metric to determine your load to store ratio. + diff --git a/groups/ivybridge/ENERGY.txt b/groups/ivybridge/ENERGY.txt new file mode 100644 index 000000000..169369dd8 --- /dev/null +++ b/groups/ivybridge/ENERGY.txt @@ -0,0 +1,31 @@ +SHORT Power and Energy consumption + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +TMP0 TEMP_CORE +PWR0 PWR_PKG_ENERGY +PWR3 PWR_DRAM_ENERGY +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED_DOUBLE +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR_DOUBLE +PMC2 FP_256_PACKED_DOUBLE + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +TEMP [C] TMP0 +Energy [J] PWR0 +Power [W] PWR0/time +Energy DRAM [J] PWR3 +Power DRAM [W] PWR3/time +AVX MFlops/s 1.0E-06*(4.0*PMC2)/time +MFlops/s 1.0E-06*(PMC0*2.0+PMC1)/time +Packed MUOPS/s 1.0E-06*PMC0/time +Scalar MUOPS/s 1.0E-06*PMC1/time + +LONG +To be added + diff --git a/groups/ivybridge/FLOPS_AVX.txt b/groups/ivybridge/FLOPS_AVX.txt new file mode 100644 index 000000000..2bc99ea1e --- /dev/null +++ b/groups/ivybridge/FLOPS_AVX.txt @@ -0,0 +1,25 @@ +SHORT Packed AVX MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 SIMD_FP_256_PACKED_SINGLE +PMC1 SIMD_FP_256_PACKED_DOUBLE + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +SP 32b packed MFlops/s 1.0E-06*(PMC0*8.0)/time +DP 32b packed MFlops/s 1.0E-06*(PMC1*4.0)/time + +LONG +Formula: +SP MFlops/s = (SIMD_FP_256_PACKED_SINGLE*8)/ runtime +DP MFlops/s = (SIMD_FP_256_PACKED_DOUBLE*4)/ runtime +- +AVX flops rates. Please note that the current flop measurements on IvyBridge are +potentially wrong. So you cannot trust these counters at the moment! + diff --git a/groups/ivybridge/FLOPS_DP.txt b/groups/ivybridge/FLOPS_DP.txt new file mode 100644 index 000000000..48bc0fab9 --- /dev/null +++ b/groups/ivybridge/FLOPS_DP.txt @@ -0,0 +1,28 @@ +SHORT Double Precision MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED_DOUBLE +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR_DOUBLE +PMC2 FP_256_PACKED_DOUBLE + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +MFlops/s 1.0E-06*(PMC0*2.0+PMC1)/time +AVX MFlops/s 1.0E-06*(PMC2*4.0)/time +Packed MUOPS/s 1.0E-06*(PMC0+PMC2)/time +Scalar MUOPS/s 1.0E-06*PMC1/time + +LONG +Formula: +MFlops/s = (FP_COMP_OPS_EXE_SSE_FP_PACKED*2 + FP_COMP_OPS_EXE_SSE_FP_SCALAR)/ runtime +AVX MFlops/s = (SIMD_FP_256_PACKED_DOUBLE*4)/ runtime +- +SSE scalar and packed double precision flop rates. Please note that the current flop measurements on SandyBridge are +potentially wrong. So you cannot trust these counters at the moment! + diff --git a/groups/ivybridge/FLOPS_SP.txt b/groups/ivybridge/FLOPS_SP.txt new file mode 100644 index 000000000..0be0721c0 --- /dev/null +++ b/groups/ivybridge/FLOPS_SP.txt @@ -0,0 +1,29 @@ +SHORT Single Precision MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED_SINGLE +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR_SINGLE +PMC2 SIMD_FP_256_PACKED_SINGLE + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +MFlops/s 1.0E-06*(PMC0*4.0+PMC1)/time +32b AVX MFlops/s 1.0E-06*(PMC2*8.0)/time +Packed MUOPS/s 1.0E-06*(PMC0+PMC2)/time +Scalar MUOPS/s 1.0E-06*PMC1/time + +LONG +Formula: +MFlops/s = (FP_COMP_OPS_EXE_SSE_FP_PACKED*4 + FP_COMP_OPS_EXE_SSE_FP_SCALAR)/ runtime +AVX MFlops/s = (FP_256_PACKED_SINGLE*8)/ runtime +- +SSE scalar and packed single precision flop rates. Please note that the current +flop measurements on IvyBridge are potentially wrong. So you cannot trust +these counters at the moment! + diff --git a/groups/ivybridge/L2.txt b/groups/ivybridge/L2.txt new file mode 100644 index 000000000..5345b7aba --- /dev/null +++ b/groups/ivybridge/L2.txt @@ -0,0 +1,32 @@ +SHORT L2 cache bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L1D_REPLACEMENT +PMC1 L1D_M_EVICT + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L2 Load [MBytes/s] 1.0E-06*PMC0*64.0/time +L2 Evict [MBytes/s] 1.0E-06*PMC1*64.0/time +L2 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L2 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 + +LONG +Formulas: +L2 Load [MBytes/s] = 1.0E-06*L1D_REPLACEMENT*64/time +L2 Evict [MBytes/s] = 1.0E-06*L1D_M_EVICT*64/time +L2 bandwidth [MBytes/s] = 1.0E-06*(L1D_REPLACEMENT+L1D_M_EVICT)*64/time +L2 data volume [GBytes] = 1.0E-09*(L1D_REPLACEMENT+L1D_M_EVICT)*64 +- +Profiling group to measure L2 cache bandwidth. The bandwidth is computed by the +number of cacheline allocated in the L1 and the number of modified cachelines +evicted from the L1. The group also output total data volume transfered between +L2 and L1. Note that this bandwidth also includes data transfers due to a write +allocate load on a store miss in L1. + diff --git a/groups/ivybridge/L2CACHE.txt b/groups/ivybridge/L2CACHE.txt new file mode 100644 index 000000000..3d7c36ea1 --- /dev/null +++ b/groups/ivybridge/L2CACHE.txt @@ -0,0 +1,35 @@ +SHORT L2 cache miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L2_TRANS_ALL_REQUESTS +PMC1 L2_RQSTS_MISS + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L2 request rate PMC0/FIXC0 +L2 miss rate PMC1/FIXC0 +L2 miss ratio PMC1/PMC0 + +LONG +Formulas: +L2 request rate = L2_TRANS_ALL_REQUESTS / INSTR_RETIRED_ANY +L2 miss rate = L2_RQSTS_MISS / INSTR_RETIRED_ANY +L2 miss ratio = L2_RQSTS_MISS / L2_TRANS_ALL_REQUESTS +- +This group measures the locality of your data accesses with regard to the +L2 Cache. L2 request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The L2 miss rate gives a measure how often it was necessary to get +cachelines from memory. And finally L2 miss ratio tells you how many of your +memory references required a cacheline to be loaded from a higher level. +While the Data cache miss rate might be given by your algorithm you should +try to get Data cache miss ratio as low as possible by increasing your cache reuse. +Note: This group might need to be revised! + + diff --git a/groups/ivybridge/L3.txt b/groups/ivybridge/L3.txt new file mode 100644 index 000000000..9a7c914b7 --- /dev/null +++ b/groups/ivybridge/L3.txt @@ -0,0 +1,32 @@ +SHORT L3 cache bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L2_LINES_IN_ALL +PMC1 L2_LINES_OUT_DIRTY_ALL + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L3 Load [MBytes/s] 1.0E-06*PMC0*64.0/time +L3 Evict [MBytes/s] 1.0E-06*PMC1*64.0/time +L3 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L3 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 + +LONG +Formulas: +L3 Load [MBytes/s] 1.0E-06*L2_LINES_IN_ALL*64/time +L3 Evict [MBytes/s] 1.0E-06*L2_LINES_OUT_DIRTY_ALL*64/time +L3 bandwidth [MBytes/s] 1.0E-06*(L2_LINES_IN_ALL+L2_LINES_OUT_DIRTY_ALL)*64/time +L3 data volume [GBytes] 1.0E-09*(L2_LINES_IN_ALL+L2_LINES_OUT_DIRTY_ALL)*64 +- +Profiling group to measure L3 cache bandwidth. The bandwidth is computed by the +number of cacheline allocated in the L2 and the number of modified cachelines +evicted from the L2. This group also outputs data volume transfered between the +L3 and measured cores L2 caches. Note that this bandwidth also includes data +transfers due to a write allocate load on a store miss in L2. + diff --git a/groups/ivybridge/MEM.txt b/groups/ivybridge/MEM.txt new file mode 100644 index 000000000..300bdea2e --- /dev/null +++ b/groups/ivybridge/MEM.txt @@ -0,0 +1,30 @@ +SHORT Main memory bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +MBOX0C0 CAS_COUNT_RD +MBOX1C0 CAS_COUNT_WR +MBOX0C1 CAS_COUNT_RD +MBOX1C1 CAS_COUNT_WR +MBOX0C2 CAS_COUNT_RD +MBOX1C2 CAS_COUNT_WR +MBOX0C3 CAS_COUNT_RD +MBOX1C3 CAS_COUNT_WR + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Memory Read BW [MBytes/s] 1.0E-06*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3)*64.0/time +Memory Write BW [MBytes/s] 1.0E-06*(MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0/time +Memory BW [MBytes/s] 1.0E-06*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3+MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0/time +Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3+MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0 + +LONG +Profiling group to measure memory bandwidth drawn by all cores of a socket. +Since this group is based on uncore events it is only possible to measure on a +per socket base. Also outputs total data volume transfered from main memory. + diff --git a/groups/ivybridge/MEM_DP.txt b/groups/ivybridge/MEM_DP.txt new file mode 100644 index 000000000..e26e6fbed --- /dev/null +++ b/groups/ivybridge/MEM_DP.txt @@ -0,0 +1,41 @@ +SHORT Power and Energy consumption + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PWR0 PWR_PKG_ENERGY +PWR3 PWR_DRAM_ENERGY +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED_DOUBLE +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR_DOUBLE +PMC2 FP_256_PACKED_DOUBLE +MBOX0C0 CAS_COUNT_RD +MBOX1C0 CAS_COUNT_WR +MBOX0C1 CAS_COUNT_RD +MBOX1C1 CAS_COUNT_WR +MBOX0C2 CAS_COUNT_RD +MBOX1C2 CAS_COUNT_WR +MBOX0C3 CAS_COUNT_RD +MBOX1C3 CAS_COUNT_WR + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Energy [J] PWR0 +Power [W] PWR0/time +Energy DRAM [J] PWR3 +Power DRAM [W] PWR3/time +AVX MFlops/s 1.0E-06*(4.0*PMC2)/time +MFlops/s 1.0E-06*(PMC0*2.0+PMC1)/time +Packed MUOPS/s 1.0E-06*PMC0/time +Scalar MUOPS/s 1.0E-06*PMC1/time +Memory Read BW [MBytes/s] 1.0E-06*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3)*64.0/time +Memory Write BW [MBytes/s] 1.0E-06*(MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0/time +Memory BW [MBytes/s] 1.0E-06*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3+MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0/time +Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3+MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0 + +LONG +To be added + diff --git a/groups/ivybridge/MEM_SP.txt b/groups/ivybridge/MEM_SP.txt new file mode 100644 index 000000000..5a17ae5c0 --- /dev/null +++ b/groups/ivybridge/MEM_SP.txt @@ -0,0 +1,41 @@ +SHORT Power and Energy consumption + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PWR0 PWR_PKG_ENERGY +PWR3 PWR_DRAM_ENERGY +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED_SINGLE +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR_SINGLE +PMC2 FP_256_PACKED_SINGLE +MBOX0C0 CAS_COUNT_RD +MBOX1C0 CAS_COUNT_WR +MBOX0C1 CAS_COUNT_RD +MBOX1C1 CAS_COUNT_WR +MBOX0C2 CAS_COUNT_RD +MBOX1C2 CAS_COUNT_WR +MBOX0C3 CAS_COUNT_RD +MBOX1C3 CAS_COUNT_WR + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Energy [J] PWR0 +Power [W] PWR0/time +Energy DRAM [J] PWR3 +Power DRAM [W] PWR3/time +AVX MFlops/s 1.0E-06*(8.0*PMC2)/time +MFlops/s 1.0E-06*(PMC0*4.0+PMC1)/time +Packed MUOPS/s 1.0E-06*PMC0/time +Scalar MUOPS/s 1.0E-06*PMC1/time +Memory Read BW [MBytes/s] 1.0E-06*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3)*64.0/time +Memory Write BW [MBytes/s] 1.0E-06*(MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0/time +Memory BW [MBytes/s] 1.0E-06*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3+MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0/time +Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3+MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0 + +LONG +To be added + diff --git a/groups/ivybridge/TLB.txt b/groups/ivybridge/TLB.txt new file mode 100644 index 000000000..78bf096bb --- /dev/null +++ b/groups/ivybridge/TLB.txt @@ -0,0 +1,22 @@ +SHORT TLB miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 DTLB_LOAD_MISSES_CAUSES_A_WALK + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L1 DTLB miss rate PMC0/FIXC0 + +LONG +Formulas: +DTLB miss rate LOAD_MISSES_CAUSES_A_WALK / INSTR_RETIRED_ANY +- +The DTLB miss rate gives a measure how often a TLB miss occured +per instruction. + diff --git a/groups/k10/BRANCH.txt b/groups/k10/BRANCH.txt new file mode 100644 index 000000000..cbc6da672 --- /dev/null +++ b/groups/k10/BRANCH.txt @@ -0,0 +1,32 @@ +SHORT Branch prediction miss rate/ratio + +EVENTSET +PMC0 INSTRUCTIONS_RETIRED +PMC1 BRANCH_RETIRED +PMC2 BRANCH_MISPREDICT_RETIRED +PMC3 BRANCH_TAKEN_RETIRED + +METRICS +Runtime (RDTSC) [s] time +Branch rate PMC1/PMC0 +Branch misprediction rate PMC2/PMC0 +Branch misprediction ratio PMC2/PMC1 +Branch taken rate PMC3/PMC0 +Branch taken ratio PMC3/PMC1 +Instructions per branch PMC0/PMC1 + +LONG +Formulas: +Branch rate = BRANCH_RETIRED / INSTRUCTIONS_RETIRED +Branch misprediction rate = BRANCH_MISPREDICT_RETIRED / INSTRUCTIONS_RETIRED +Branch misprediction ratio = BRANCH_MISPREDICT_RETIRED / BRANCH_RETIRED +Branch taken rate = BRANCH_TAKEN_RETIRED / INSTRUCTIONS_RETIRED +Branch taken ratio = BRANCH_TAKEN_RETIRED / BRANCH_RETIRED +Instructions per branch = INSTRUCTIONS_RETIRED / BRANCH_RETIRED +- +The rates state how often in average a branch or a mispredicted branch occured +per instruction retired in total. The Branch misprediction ratio sets directly +into relation what ration of all branch instruction where mispredicted. +Instructions per branch is 1/Branch rate. The same applies for the branches +taken metrics. + diff --git a/groups/k10/CACHE.txt b/groups/k10/CACHE.txt new file mode 100644 index 000000000..e70823ee2 --- /dev/null +++ b/groups/k10/CACHE.txt @@ -0,0 +1,34 @@ +SHORT Data cache miss rate/ratio + +EVENTSET +PMC0 INSTRUCTIONS_RETIRED +PMC1 DATA_CACHE_ACCESSES +PMC2 DATA_CACHE_REFILLS_L2_ALL +PMC3 DATA_CACHE_REFILLS_NORTHBRIDGE_ALL + +METRICS +Runtime (RDTSC) [s] time +Data cache misses PMC2+PMC3 +Data cache request rate PMC1/PMC0 +Data cache miss rate (PMC2+PMC3)/PMC0 +Data cache miss ratio (PMC2+PMC3)/PMC1 + +LONG +Formulas: +Data cache misses = DATA_CACHE_REFILLS_L2_AL + DATA_CACHE_REFILLS_NORTHBRIDGE_ALL +Data cache request rate = DATA_CACHE_ACCESSES / INSTRUCTIONS_RETIRED +Data cache miss rate = (DATA_CACHE_REFILLS_L2_AL + DATA_CACHE_REFILLS_NORTHBRIDGE_ALL)/INSTRUCTIONS_RETIRED +Data cache miss ratio = (DATA_CACHE_REFILLS_L2_AL + DATA_CACHE_REFILLS_NORTHBRIDGE_ALL)/DATA_CACHE_ACCESSES +- +This group measures the locality of your data accesses with regard to the +L1 Cache. Data cache request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The Data cache miss rate gives a measure how often it was necessary to get +cachelines from higher levels of the memory hierarchy. And finally +Data cache miss ratio tells you how many of your memory references required +a cacheline to be loaded from a higher level. While the Data cache miss rate +might be given by your algorithm you should try to get Data cache miss ratio +as low as possible by increasing your cache reuse. +This group was taken from the whitepaper -Basic Performance Measurements for AMD Athlon 64, +AMD Opteron and AMD Phenom Processors- from Paul J. Drongowski. + diff --git a/groups/k10/CPI.txt b/groups/k10/CPI.txt new file mode 100644 index 000000000..6595c2d7b --- /dev/null +++ b/groups/k10/CPI.txt @@ -0,0 +1,21 @@ +SHORT Cycles per instruction + +EVENTSET +PMC0 INSTRUCTIONS_RETIRED +PMC1 CPU_CLOCKS_UNHALTED +PMC2 UOPS_RETIRED + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] PMC1*inverseClock +CPI PMC1/PMC0 +CPI (based on uops) PMC1/PMC2 +IPC PMC0/PMC1 + +LONG +This group measures how efficient the processor works with +regard to instruction throughput. Also important as a standalone +metric is INSTRUCTIONS_RETIRED as it tells you how many instruction +you need to execute for a task. An optimization might show very +low CPI values but execute many more instruction for it. + diff --git a/groups/k10/FLOPS_DP.txt b/groups/k10/FLOPS_DP.txt new file mode 100644 index 000000000..4eccf8b64 --- /dev/null +++ b/groups/k10/FLOPS_DP.txt @@ -0,0 +1,22 @@ +SHORT Double Precision MFlops/s + +EVENTSET +PMC0 SSE_RETIRED_ADD_DOUBLE_FLOPS +PMC1 SSE_RETIRED_MULT_DOUBLE_FLOPS +PMC2 CPU_CLOCKS_UNHALTED + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] PMC2*inverseClock +DP MFlops/s 1.0E-06*(PMC0+PMC1)/time +DP Add MFlops/s 1.0E-06*PMC0/time +DP Mult MFlops/s 1.0E-06*PMC1/time + +LONG +Formulas: +DP MFlops/s = 1.0E-06*(SSE_RETIRED_ADD_DOUBLE_FLOPS+SSE_RETIRED_MULT_DOUBLE_FLOPS)/time +- +Profiling group to measure double SSE flops. +Dont forget that your code might also execute X87 flops. + + diff --git a/groups/k10/FLOPS_SP.txt b/groups/k10/FLOPS_SP.txt new file mode 100644 index 000000000..7a0bd52a1 --- /dev/null +++ b/groups/k10/FLOPS_SP.txt @@ -0,0 +1,22 @@ +SHORT Single Precision MFlops/s + +EVENTSET +PMC0 SSE_RETIRED_ADD_SINGLE_FLOPS +PMC1 SSE_RETIRED_MULT_SINGLE_FLOPS +PMC2 CPU_CLOCKS_UNHALTED + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] PMC2*inverseClock +SP MFlops/s 1.0E-06*(PMC0+PMC1)/time +SP Add MFlops/s 1.0E-06*PMC0/time +SP Mult MFlops/s 1.0E-06*PMC1/time + +LONG +Formulas: +SP MFlops/s = 1.0E-06*(SSE_RETIRED_ADD_SINGLE_FLOPS+SSE_RETIRED_MULT_SINGLE_FLOPS)/time +- +Profiling group to measure single precision SSE flops. +Dont forget that your code might also execute X87 flops. + + diff --git a/groups/k10/FLOPS_X87.txt b/groups/k10/FLOPS_X87.txt new file mode 100644 index 000000000..9a585b4e3 --- /dev/null +++ b/groups/k10/FLOPS_X87.txt @@ -0,0 +1,19 @@ +SHORT X87 MFlops/s + +EVENTSET +PMC0 X87_FLOPS_RETIRED_ADD +PMC1 X87_FLOPS_RETIRED_MULT +PMC2 X87_FLOPS_RETIRED_DIV +PMC3 CPU_CLOCKS_UNHALTED + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] PMC3*inverseClock +X87 MFlops/s 1.0E-06*(PMC0+PMC1+PMC2)/time +X87 Add MFlops/s 1.0E-06*PMC0/time +X87 Mult MFlops/s 1.0E-06*PMC1/time +X87 Div MFlops/s 1.0E-06*PMC2/time + +LONG +Profiling group to measure X87 flop rates. + diff --git a/groups/k10/FPU_EXCEPTION.txt b/groups/k10/FPU_EXCEPTION.txt new file mode 100644 index 000000000..eff87fca2 --- /dev/null +++ b/groups/k10/FPU_EXCEPTION.txt @@ -0,0 +1,21 @@ +SHORT Floating point exceptions + +EVENTSET +PMC0 INSTRUCTIONS_RETIRED +PMC1 FP_INSTRUCTIONS_RETIRED_ALL +PMC2 FPU_EXCEPTIONS_ALL + +METRICS +Runtime (RDTSC) [s] time +Overall FP exception rate PMC2/PMC0 +FP exception rate PMC2/PMC1 + +LONG +Formulas: +Overall FP exception rate = FPU_EXCEPTIONS_ALL / INSTRUCTIONS_RETIRED +FP exception rate = FPU_EXCEPTIONS_ALL / FP_INSTRUCTIONS_RETIRED_ALL +- +Floating point exceptions occur e.g. on the treatment of Denormals. +There might be a large penalty if there are too many floating point +exceptions. + diff --git a/groups/k10/ICACHE.txt b/groups/k10/ICACHE.txt new file mode 100644 index 000000000..222ea5d5a --- /dev/null +++ b/groups/k10/ICACHE.txt @@ -0,0 +1,25 @@ +SHORT Instruction cache miss rate/ratio + +EVENTSET +PMC0 INSTRUCTIONS_RETIRED +PMC1 ICACHE_FETCHES +PMC2 ICACHE_REFILLS_L2 +PMC3 ICACHE_REFILLS_MEM + +METRICS +Runtime (RDTSC) [s] time +Instruction cache misses PMC2+PMC3 +Instruction cache request rate PMC1/PMC0 +Instruction cache miss rate (PMC2+PMC3)/PMC0 +Instruction cache miss ratio (PMC2+PMC3)/PMC1 + +LONG +Formulas: +Instruction cache misses ICACHE_REFILLS_L2 + ICACHE_REFILLS_MEM +Instruction cache request rate ICACHE_FETCHES / INSTRUCTIONS_RETIRED +Instruction cache miss rate (ICACHE_REFILLS_L2+ICACHE_REFILLS_MEM)/INSTRUCTIONS_RETIRED +Instruction cache miss ratio (ICACHE_REFILLS_L2+ICACHE_REFILLS_MEM)/ICACHE_FETCHES +- +This group measures the locality of your instruction code with regard to the +L1 I-Cache. + diff --git a/groups/k10/L2.txt b/groups/k10/L2.txt new file mode 100644 index 000000000..8b61bcc87 --- /dev/null +++ b/groups/k10/L2.txt @@ -0,0 +1,29 @@ +SHORT L2 cache bandwidth in MBytes/s + +EVENTSET +PMC0 DATA_CACHE_REFILLS_L2_ALL +PMC1 DATA_CACHE_EVICTED_ALL +PMC2 CPU_CLOCKS_UNHALTED + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] PMC2*inverseClock +L2 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L2 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 +L2 refill bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time +L2 evict [MBytes/s] 1.0E-06*PMC1*64.0/time + +LONG +Formulas: +L2 bandwidth [MBytes/s] 1.0E-06*(DATA_CACHE_REFILLS_L2_ALL+DATA_CACHE_EVICTED_ALL)*64/time +L2 data volume [GBytes] 1.0E-09*(DATA_CACHE_REFILLS_L2_ALL+DATA_CACHE_EVICTED_ALL)*64 +L2 refill bandwidth [MBytes/s] 1.0E-06*DATA_CACHE_REFILLS_L2_ALL*64/time +L2 evict [MBytes/s] 1.0E-06*DATA_CACHE_EVICTED_ALL*64/time +- +Profiling group to measure L2 cache bandwidth. The bandwidth is +computed by the number of cacheline loaded from L2 to L1 and the +number of modified cachelines evicted from the L1. +Note that this bandwidth also includes data transfers due to a +write allocate load on a store miss in L1 and copy back transfers if +originated from L2. + diff --git a/groups/k10/L2CACHE.txt b/groups/k10/L2CACHE.txt new file mode 100644 index 000000000..d384c485d --- /dev/null +++ b/groups/k10/L2CACHE.txt @@ -0,0 +1,32 @@ +SHORT L2 cache miss rate/ratio + +EVENTSET +PMC0 INSTRUCTIONS_RETIRED +PMC1 L2_REQUESTS_ALL +PMC2 L2_MISSES_ALL +PMC3 L2_FILL_ALL + +METRICS +Runtime (RDTSC) [s] time +L2 request rate (PMC1+PMC3)/PMC0 +L2 miss rate PMC2/PMC0 +L2 miss ratio PMC2/(PMC1+PMC3) + +LONG +Formulas: +L2 request rate = (L2_REQUESTS_ALL+L2_FILL_ALL)/INSTRUCTIONS_RETIRED +L2 miss rate = L2_MISSES_ALL/INSTRUCTIONS_RETIRED +L2 miss ratio = L2_MISSES_ALL/(L2_REQUESTS_ALL+L2_FILL_ALL) +- +This group measures the locality of your data accesses with regard to the +L2 Cache. L2 request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The L2 miss rate gives a measure how often it was necessary to get +cachelines from memory. And finally L2 miss ratio tells you how many of your +memory references required a cacheline to be loaded from a higher level. +While the Data cache miss rate might be given by your algorithm you should +try to get Data cache miss ratio as low as possible by increasing your cache reuse. +This group was taken from the whitepaper -Basic Performance Measurements for AMD Athlon 64, +AMD Opteron and AMD Phenom Processors- from Paul J. Drongowski. + + diff --git a/groups/k10/L3CACHE.txt b/groups/k10/L3CACHE.txt new file mode 100644 index 000000000..85b4522d3 --- /dev/null +++ b/groups/k10/L3CACHE.txt @@ -0,0 +1,33 @@ +SHORT L3 cache miss rate/ratio + +EVENTSET +PMC0 INSTRUCTIONS_RETIRED +PMC1 L3_READ_REQUEST_ALL_ALL_CORES +PMC2 L3_MISSES_ALL_ALL_CORES + +METRICS +Runtime (RDTSC) [s] time +L3 request rate PMC1/PMC0 +L3 miss rate PMC2/PMC0 +L3 miss ratio PMC2/PMC1 + +LONG +Formulas: +L3 request rate = L3_READ_REQUEST_ALL_ALL_CORES / INSTRUCTIONS_RETIRED +L3 miss rate = L3_MISSES_ALL_ALL_CORES / INSTRUCTIONS_RETIRED +L3 miss ratio = L3_MISSES_ALL_ALL_CORES / L3_READ_REQUEST_ALL_ALL_CORES +- +This group measures the locality of your data accesses with regard to the +L3 Cache. L3 request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The L3 miss rate gives a measure how often it was necessary to get +cachelines from memory. And finally L3 miss ratio tells you how many of your +memory references required a cacheline to be loaded from a higher level. +While the Data cache miss rate might be given by your algorithm you should +try to get Data cache miss ratio as low as possible by increasing your cache reuse. +Note: As this group measures the accesses from all cores it only makes sense +to measure with one core per socket, similiar as with the Intel Nehalem Uncore events. +This group was taken from the whitepaper -Basic Performance Measurements for AMD Athlon 64, +AMD Opteron and AMD Phenom Processors- from Paul J. Drongowski. + + diff --git a/groups/k10/MEM.txt b/groups/k10/MEM.txt new file mode 100644 index 000000000..b6c9f3346 --- /dev/null +++ b/groups/k10/MEM.txt @@ -0,0 +1,26 @@ +SHORT Main memory bandwidth in MBytes/s + +EVENTSET +PMC0 NORTHBRIDGE_READ_RESPONSE_ALL +PMC1 OCTWORDS_WRITE_TRANSFERS +PMC2 DRAM_ACCESSES_DCTO_ALL +PMC3 DRAM_ACCESSES_DCT1_ALL + +METRICS +Runtime (RDTSC) [s] time +Read data bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time +Write data bandwidth [MBytes/s] 1.0E-06*PMC1*8.0/time +Memory bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3)*64.0/time +Memory data volume [GBytes] 1.0E-09*(PMC2+PMC3)*64.0 + +LONG +Formulas: +Read data bandwidth (MBytes/s) 1.0E-06*NORTHBRIDGE_READ_RESPONSE_ALL*64/time +Write data bandwidth (MBytes/s) 1.0E-06*OCTWORDS_WRITE_TRANSFERS*8/time +Memory bandwidth [MBytes/s] = 1.0E-06*(DRAM_ACCESSES_DCTO_ALL+DRAM_ACCESSES_DCT1_ALL)*64/time +Memory data volume [GBytes] = 1.0E-09*(DRAM_ACCESSES_DCTO_ALL+DRAM_ACCESSES_DCT1_ALL)*64 +- +Profiling group to measure memory bandwidth drawn by all cores of a socket. +Note: As this group measures the accesses from all cores it only makes sense +to measure with one core per socket, similiar as with the Intel Nehalem Uncore events. + diff --git a/groups/k10/NUMA.txt b/groups/k10/NUMA.txt new file mode 100644 index 000000000..9734e3c31 --- /dev/null +++ b/groups/k10/NUMA.txt @@ -0,0 +1,25 @@ +SHORT Bandwidth on the Hypertransport links + +EVENTSET +PMC0 CPU_TO_DRAM_LOCAL_TO_0 +PMC1 CPU_TO_DRAM_LOCAL_TO_1 +PMC2 CPU_TO_DRAM_LOCAL_TO_2 +PMC3 CPU_TO_DRAM_LOCAL_TO_3 + +METRICS +Runtime (RDTSC) [s] time +Mega requests per second to Node 0 1.0E-06*PMC0/time +Mega requests per second to Node 1 1.0E-06*PMC1/time +Mega requests per second to Node 2 1.0E-06*PMC2/time +Mega requests per second to Node 3 1.0E-06*PMC3/time + +LONG +Formulas: +Mega requests per second to Node X 1.0E-06*PMCX/time +- +Profiling group to measure the traffic from local CPU to the different +DRAM NUMA nodes. This group allows to detect NUMA problems in a threaded +code. You must first determine on which memory domains your code is running. +A code should only have significant traffic to its own memory domain. + + diff --git a/groups/k10/NUMA2.txt b/groups/k10/NUMA2.txt new file mode 100644 index 000000000..dbfbbb08b --- /dev/null +++ b/groups/k10/NUMA2.txt @@ -0,0 +1,24 @@ +SHORT Bandwidth on the Hypertransport links + +EVENTSET +PMC0 CPU_TO_DRAM_LOCAL_TO_4 +PMC1 CPU_TO_DRAM_LOCAL_TO_5 +PMC2 CPU_TO_DRAM_LOCAL_TO_6 +PMC3 CPU_TO_DRAM_LOCAL_TO_7 + +METRICS +Runtime (RDTSC) [s] time +Hyper Transport link0 bandwidth (MBytes/s) 1.0E-06*PMC0*4.0/time +Hyper Transport link1 bandwidth (MBytes/s) 1.0E-06*PMC1*4.0/time +Hyper Transport link2 bandwidth (MBytes/s) 1.0E-06*PMC2*4.0/time +Hyper Transport link3 bandwidth (MBytes/s) 1.0E-06*PMC3*4.0/time + +LONG +Formulas: +Hyper Transport linkn bandwidth (MBytes/s) 1.0E-06*HYPERTRANSPORT_LINK0_ALL_SENT*4.0/time +- +Profiling group to measure the bandwidth over the Hypertransport links. Can be used +to detect NUMA problems. Usually there should be only limited traffic over the QPI +links for optimal performance. + + diff --git a/groups/k10/TLB.txt b/groups/k10/TLB.txt new file mode 100644 index 000000000..29844919a --- /dev/null +++ b/groups/k10/TLB.txt @@ -0,0 +1,35 @@ +SHORT TLB miss rate/ratio + +EVENTSET +PMC0 INSTRUCTIONS_RETIRED +PMC1 DATA_CACHE_ACCESSES +PMC2 DTLB_L2_HIT_ALL +PMC3 DTLB_L2_MISS_ALL + +METRICS +Runtime (RDTSC) [s] time +L1 DTLB request rate PMC1/PMC0 +L1 DTLB miss rate (PMC2+PMC3)/PMC0 +L1 DTLB miss ratio (PMC2+PMC3)/PMC1 +L2 DTLB request rate (PMC2+PMC3)/PMC0 +L2 DTLB miss rate PMC3/PMC0 +L2 DTLB miss ratio PMC3/(PMC2+PMC3) + + +LONG +Formulas: +L1 DTLB request rate DATA_CACHE_ACCESSES / INSTRUCTIONS_RETIRED +L1 DTLB miss rate (DTLB_L2_HIT_ALL+DTLB_L2_MISS_ALL)/INSTRUCTIONS_RETIRED +L1 DTLB miss ratio (DTLB_L2_HIT_ALL+DTLB_L2_MISS_ALL)/DATA_CACHE_ACCESSES +L2 DTLB request rate (DTLB_L2_HIT_ALL+DTLB_L2_MISS_ALL)/INSTRUCTIONS_RETIRED +L2 DTLB miss rate DTLB_L2_MISS_ALL / INSTRUCTIONS_RETIRED +L2 DTLB miss ratio DTLB_L2_MISS_ALL / (DTLB_L2_HIT_ALL+DTLB_L2_MISS_ALL) +- +L1 DTLB request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The DTLB miss rate gives a measure how often a TLB miss occured +per instruction. And finally L1 DTLB miss ratio tells you how many +of your memory references required caused a TLB miss in average. +NOTE: The L2 metrics are only relevant if L2 DTLB request rate is equal to the L1 DTLB miss rate! +This group was taken from the whitepaper Basic -Performance Measurements for AMD Athlon 64, +AMD Opteron and AMD Phenom Processors- from Paul J. Drongowski. diff --git a/groups/k8/BRANCH.txt b/groups/k8/BRANCH.txt new file mode 100644 index 000000000..64e10cdd6 --- /dev/null +++ b/groups/k8/BRANCH.txt @@ -0,0 +1,31 @@ +SHORT Branch prediction miss rate/ratio + +EVENTSET +PMC0 INSTRUCTIONS_RETIRED +PMC1 BRANCH_RETIRED +PMC2 BRANCH_MISPREDICT_RETIRED +PMC3 BRANCH_TAKEN_RETIRED + +METRICS +Runtime (RDTSC) [s] time +Branch rate PMC1/PMC0 +Branch misprediction rate PMC2/PMC0 +Branch misprediction ratio PMC2/PMC1 +Branch taken rate PMC3/PMC0 +Branch taken ratio PMC3/PMC1 +Instructions per branch PMC0/PMC1 + +LONG +Formulas: +Branch rate = BRANCH_RETIRED / INSTRUCTIONS_RETIRED +Branch misprediction rate = BRANCH_MISPREDICT_RETIRED / INSTRUCTIONS_RETIRED +Branch misprediction ratio = BRANCH_MISPREDICT_RETIRED / BRANCH_RETIRED +Branch taken rate = BRANCH_TAKEN_RETIRED / INSTRUCTIONS_RETIRED +Branch taken ratio = BRANCH_TAKEN_RETIRED / BRANCH_RETIRED +Instructions per branch = INSTRUCTIONS_RETIRED / BRANCH_RETIRED +- +The rates state how often in average a branch or a mispredicted branch occured +per instruction retired in total. The Branch misprediction ratio sets directly +into relation what ration of all branch instruction where mispredicted. +Instructions per branch is 1/Branch rate. The same applies for the branches +taken metrics. diff --git a/groups/k8/CACHE.txt b/groups/k8/CACHE.txt new file mode 100644 index 000000000..ff20b5ebd --- /dev/null +++ b/groups/k8/CACHE.txt @@ -0,0 +1,33 @@ +SHORT Data cache miss rate/ratio + +EVENTSET +PMC0 INSTRUCTIONS_RETIRED +PMC1 DATA_CACHE_ACCESSES +PMC2 DATA_CACHE_REFILLS_L2_ALL +PMC3 DATA_CACHE_REFILLS_NORTHBRIDGE_ALL + +METRICS +Runtime (RDTSC) [s] time +Data cache misses PMC2+PMC3 +Data cache request rate PMC1/PMC0 +Data cache miss rate (PMC2+PMC3)/PMC0 +Data cache miss ratio (PMC2+PMC3)/PMC1 + +LONG +Formulas: +Data cache misses = DATA_CACHE_REFILLS_L2_AL + DATA_CACHE_REFILLS_NORTHBRIDGE_ALL +Data cache request rate = DATA_CACHE_ACCESSES / INSTRUCTIONS_RETIRED +Data cache miss rate = (DATA_CACHE_REFILLS_L2_AL + DATA_CACHE_REFILLS_NORTHBRIDGE_ALL)/INSTRUCTIONS_RETIRED +Data cache miss ratio = (DATA_CACHE_REFILLS_L2_AL + DATA_CACHE_REFILLS_NORTHBRIDGE_ALL)/DATA_CACHE_ACCESSES +- +This group measures the locality of your data accesses with regard to the +L1 Cache. Data cache request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The Data cache miss rate gives a measure how often it was necessary to get +cachelines from higher levels of the memory hierarchy. And finally +Data cache miss ratio tells you how many of your memory references required +a cacheline to be loaded from a higher level. While the Data cache miss rate +might be given by your algorithm you should try to get Data cache miss ratio +as low as possible by increasing your cache reuse. +This group was taken from the whitepaper -Basic Performance Measurements for AMD Athlon 64, +AMD Opteron and AMD Phenom Processors- from Paul J. Drongowski. diff --git a/groups/k8/CPI.txt b/groups/k8/CPI.txt new file mode 100644 index 000000000..6595c2d7b --- /dev/null +++ b/groups/k8/CPI.txt @@ -0,0 +1,21 @@ +SHORT Cycles per instruction + +EVENTSET +PMC0 INSTRUCTIONS_RETIRED +PMC1 CPU_CLOCKS_UNHALTED +PMC2 UOPS_RETIRED + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] PMC1*inverseClock +CPI PMC1/PMC0 +CPI (based on uops) PMC1/PMC2 +IPC PMC0/PMC1 + +LONG +This group measures how efficient the processor works with +regard to instruction throughput. Also important as a standalone +metric is INSTRUCTIONS_RETIRED as it tells you how many instruction +you need to execute for a task. An optimization might show very +low CPI values but execute many more instruction for it. + diff --git a/groups/k8/ICACHE.txt b/groups/k8/ICACHE.txt new file mode 100644 index 000000000..222ea5d5a --- /dev/null +++ b/groups/k8/ICACHE.txt @@ -0,0 +1,25 @@ +SHORT Instruction cache miss rate/ratio + +EVENTSET +PMC0 INSTRUCTIONS_RETIRED +PMC1 ICACHE_FETCHES +PMC2 ICACHE_REFILLS_L2 +PMC3 ICACHE_REFILLS_MEM + +METRICS +Runtime (RDTSC) [s] time +Instruction cache misses PMC2+PMC3 +Instruction cache request rate PMC1/PMC0 +Instruction cache miss rate (PMC2+PMC3)/PMC0 +Instruction cache miss ratio (PMC2+PMC3)/PMC1 + +LONG +Formulas: +Instruction cache misses ICACHE_REFILLS_L2 + ICACHE_REFILLS_MEM +Instruction cache request rate ICACHE_FETCHES / INSTRUCTIONS_RETIRED +Instruction cache miss rate (ICACHE_REFILLS_L2+ICACHE_REFILLS_MEM)/INSTRUCTIONS_RETIRED +Instruction cache miss ratio (ICACHE_REFILLS_L2+ICACHE_REFILLS_MEM)/ICACHE_FETCHES +- +This group measures the locality of your instruction code with regard to the +L1 I-Cache. + diff --git a/groups/k8/L2.txt b/groups/k8/L2.txt new file mode 100644 index 000000000..58eae3bfc --- /dev/null +++ b/groups/k8/L2.txt @@ -0,0 +1,31 @@ +SHORT L2 cache bandwidth in MBytes/s + +EVENTSET +PMC0 DATA_CACHE_REFILLS_L2_ALL +PMC1 DATA_CACHE_EVICTED_ALL +PMC2 CPU_CLOCKS_UNHALTED + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] PMC2*inverseClock +L2 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L2 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 +L2 refill bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time +L2 evict [MBytes/s] 1.0E-06*PMC1*64.0/time + +LONG +Formulas: +L2 bandwidth [MBytes/s] 1.0E-06*(DATA_CACHE_REFILLS_L2_ALL+DATA_CACHE_EVICTED_ALL)*64/time +L2 data volume [GBytes] 1.0E-09*(DATA_CACHE_REFILLS_L2_ALL+DATA_CACHE_EVICTED_ALL)*64 +L2 refill bandwidth [MBytes/s] 1.0E-06*DATA_CACHE_REFILLS_L2_ALL*64/time +L2 evict [MBytes/s] 1.0E-06*DATA_CACHE_EVICTED_ALL*64/time +- +Profiling group to measure L2 cache bandwidth. The bandwidth is +computed by the number of cacheline loaded from L2 to L1 and the +number of modified cachelines evicted from the L1. +Note that this bandwidth also includes data transfers due to a +write allocate load on a store miss in L1 and copy back transfers if +originated from L2. + + + diff --git a/groups/kabini/BRANCH.txt b/groups/kabini/BRANCH.txt new file mode 100644 index 000000000..1ae9f36a4 --- /dev/null +++ b/groups/kabini/BRANCH.txt @@ -0,0 +1,32 @@ +SHORT Branch prediction miss rate/ratio + +EVENTSET +PMC0 RETIRED_INSTRUCTIONS +PMC1 RETIRED_BRANCH_INSTR +PMC2 RETIRED_MISPREDICTED_BRANCH_INSTR +PMC3 RETIRED_TAKEN_BRANCH_INSTR + +METRICS +Runtime (RDTSC) [s] time +Branch rate PMC1/PMC0 +Branch misprediction rate PMC2/PMC0 +Branch misprediction ratio PMC2/PMC1 +Branch taken rate PMC3/PMC0 +Branch taken ratio PMC3/PMC1 +Instructions per branch PMC0/PMC1 + +LONG +Formulas: +Branch rate = RETIRED_BRANCH_INSTR / RETIRED_INSTRUCTIONS +Branch misprediction rate = RETIRED_MISPREDICTED_BRANCH_INSTR / RETIRED_INSTRUCTIONS +Branch misprediction ratio = RETIRED_MISPREDICTED_BRANCH_INSTR / RETIRED_BRANCH_INSTR +Branch taken rate = RETIRED_TAKEN_BRANCH_INSTR / RETIRED_INSTRUCTIONS +Branch taken ratio = RETIRED_TAKEN_BRANCH_INSTR / RETIRED_BRANCH_INSTR +Instructions per branch = RETIRED_INSTRUCTIONS / RETIRED_BRANCH_INSTR +- +The rates state how often in average a branch or a mispredicted branch occured +per instruction retired in total. The branch misprediction ratio sets directly +into relation what ratio of all branch instruction where mispredicted. +Instructions per branch is 1/Branch rate. The same applies for the branches +taken metrics. + diff --git a/groups/kabini/CACHE.txt b/groups/kabini/CACHE.txt new file mode 100644 index 000000000..ef62f76a0 --- /dev/null +++ b/groups/kabini/CACHE.txt @@ -0,0 +1,32 @@ +SHORT Data cache miss rate/ratio + +EVENTSET +PMC0 RETIRED_INSTRUCTIONS +PMC1 DATA_CACHE_ACCESSES +PMC2 DATA_CACHE_REFILLS_ALL +PMC3 DATA_CACHE_REFILLS_NB_ALL + +METRICS +Runtime (RDTSC) [s] time +Data cache misses PMC2+PMC3 +Data cache request rate PMC1/PMC0 +Data cache miss rate (PMC2+PMC3)/PMC0 +Data cache miss ratio (PMC2+PMC3)/PMC1 + +LONG +Formulas: +Data cache misses = DATA_CACHE_REFILLS_ALL + DATA_CACHE_REFILLS_NB_ALL +Data cache request rate = DATA_CACHE_ACCESSES / RETIRED_INSTRUCTIONS +Data cache miss rate = (DATA_CACHE_REFILLS_ALL + DATA_CACHE_REFILLS_NB_ALL)/RETIRED_INSTRUCTIONS +Data cache miss ratio = (DATA_CACHE_REFILLS_ALL + DATA_CACHE_REFILLS_NB_ALL)/DATA_CACHE_ACCESSES +- +This group measures the locality of your data accesses with regard to the +L1 Cache. Data cache request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The Data cache miss rate gives a measure how often it was necessary to get +cachelines from higher levels of the memory hierarchy. And finally +Data cache miss ratio tells you how many of your memory references required +a cacheline to be loaded from a higher level. While the Data cache miss rate +might be given by your algorithm you should try to get Data cache miss ratio +as low as possible by increasing your cache reuse. + diff --git a/groups/kabini/CPI.txt b/groups/kabini/CPI.txt new file mode 100644 index 000000000..47711f45b --- /dev/null +++ b/groups/kabini/CPI.txt @@ -0,0 +1,21 @@ +SHORT Cycles per instruction + +EVENTSET +PMC0 RETIRED_INSTRUCTIONS +PMC1 CPU_CLOCKS_UNHALTED +PMC2 RETIRED_UOPS + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] PMC1*inverseClock +CPI PMC1/PMC0 +CPI (based on uops) PMC1/PMC2 +IPC PMC0/PMC1 + +LONG +This group measures how efficient the processor works with +regard to instruction throughput. Also important as a standalone +metric is RETIRED_INSTRUCTIONS as it tells you how many instruction +you need to execute for a task. An optimization might show very +low CPI values but execute many more instruction for it. + diff --git a/groups/kabini/DATA.txt b/groups/kabini/DATA.txt new file mode 100644 index 000000000..78e4c3c81 --- /dev/null +++ b/groups/kabini/DATA.txt @@ -0,0 +1,16 @@ +SHORT Load to store ratio + +EVENTSET +PMC0 LS_DISPATCH_LOADS +PMC1 LS_DISPATCH_STORES + +METRICS +Runtime (RDTSC) [s] time +Load to Store ratio PMC0/PMC1 + +LONG +Formulas: +Load to Store ratio = LS_DISPATCH_LOADS / LS_DISPATCH_STORES +- +This is a simple metric to determine your load to store ratio. + diff --git a/groups/kabini/FLOPS_DP.txt b/groups/kabini/FLOPS_DP.txt new file mode 100644 index 000000000..d7f5f57cb --- /dev/null +++ b/groups/kabini/FLOPS_DP.txt @@ -0,0 +1,23 @@ +SHORT Double Precision MFlops/s + +EVENTSET +PMC0 RETIRED_INSTRUCTIONS +PMC1 CPU_CLOCKS_UNHALTED +PMC2 RETIRED_UOPS +PMC3 RETIRED_FLOPS_DOUBLE_ALL + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] PMC1*inverseClock +MFlops/s 1.0E-06*(PMC3)/time +CPI PMC1/PMC0 +CPI (based on uops) PMC1/PMC2 +IPC PMC0/PMC1 + +LONG +Formulas: +DP MFlops/s = 1.0E-06*(RETIRED_FLOPS_DOUBLE_ALL)/time +- +Profiling group to measure double precisision flop rate. + + diff --git a/groups/kabini/FLOPS_SP.txt b/groups/kabini/FLOPS_SP.txt new file mode 100644 index 000000000..1c4dcc371 --- /dev/null +++ b/groups/kabini/FLOPS_SP.txt @@ -0,0 +1,23 @@ +SHORT Single Precision MFlops/s + +EVENTSET +PMC0 RETIRED_INSTRUCTIONS +PMC1 CPU_CLOCKS_UNHALTED +PMC2 RETIRED_UOPS +PMC3 RETIRED_FLOPS_SINGLE_ALL + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] PMC1*inverseClock +MFlops/s 1.0E-06*(PMC3)/time +CPI PMC1/PMC0 +CPI (based on uops) PMC1/PMC2 +IPC PMC0/PMC1 + +LONG +Formulas: +SP MFlops/s = 1.0E-06*(RETIRED_FLOPS_SINGLE_ALL)/time +- +Profiling group to measure single precision flop rate. + + diff --git a/groups/kabini/FPU_EXCEPTION.txt b/groups/kabini/FPU_EXCEPTION.txt new file mode 100644 index 000000000..23814dadb --- /dev/null +++ b/groups/kabini/FPU_EXCEPTION.txt @@ -0,0 +1,21 @@ +SHORT Floating point exceptions + +EVENTSET +PMC0 RETIRED_INSTRUCTIONS +PMC1 RETIRED_FP_INSTRUCTIONS_ALL +PMC2 FPU_EXCEPTION_ALL + +METRICS +Runtime (RDTSC) [s] time +Overall FP exception rate PMC2/PMC0 +FP exception rate PMC2/PMC1 + +LONG +Formulas: +Overall FP exception rate = FPU_EXCEPTIONS_ALL / RETIRED_INSTRUCTIONS +FP exception rate = FPU_EXCEPTIONS_ALL / FP_INSTRUCTIONS_RETIRED_ALL +- +Floating point exceptions occur e.g. on the treatment of Denormals. +There might be a large penalty if there are too many floating point +exceptions. + diff --git a/groups/kabini/ICACHE.txt b/groups/kabini/ICACHE.txt new file mode 100644 index 000000000..be5e5f591 --- /dev/null +++ b/groups/kabini/ICACHE.txt @@ -0,0 +1,25 @@ +SHORT Instruction cache miss rate/ratio + +EVENTSET +PMC0 INSTRUCTION_CACHE_FETCHES +PMC1 INSTRUCTION_CACHE_L2_REFILLS +PMC2 INSTRUCTION_CACHE_SYSTEM_REFILLS +PMC3 RETIRED_INSTRUCTIONS + +METRICS +Runtime (RDTSC) [s] time +Instruction cache misses PMC1+PMC2 +Instruction cache request rate PMC0/PMC3 +Instruction cache miss rate (PMC1+PMC2)/PMC3 +Instruction cache miss ratio (PMC1+PMC2)/PMC0 + +LONG +Formulas: +Instruction cache misses INSTRUCTION_CACHE_L2_REFILLS + INSTRUCTION_CACHE_SYSTEM_REFILLS +Instruction cache request rate INSTRUCTION_CACHE_FETCHES / RETIRED_INSTRUCTIONS +Instruction cache miss rate (INSTRUCTION_CACHE_L2_REFILLS + INSTRUCTION_CACHE_SYSTEM_REFILLS)/RETIRED_INSTRUCTIONS +Instruction cache miss ratio (INSTRUCTION_CACHE_L2_REFILLS + INSTRUCTION_CACHE_SYSTEM_REFILLS)/INSTRUCTION_CACHE_FETCHES +- +This group measures the locality of your instruction code with regard to the +L1 I-Cache. + diff --git a/groups/kabini/L2.txt b/groups/kabini/L2.txt new file mode 100644 index 000000000..d06d80974 --- /dev/null +++ b/groups/kabini/L2.txt @@ -0,0 +1,29 @@ +SHORT L2 cache bandwidth in MBytes/s + +EVENTSET +PMC0 DATA_CACHE_REFILLS_ALL +PMC1 DATA_CACHE_EVICTED_ALL +PMC2 CPU_CLOCKS_UNHALTED + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] PMC2*inverseClock +L2 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L2 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 +L2 refill bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time +L2 evict [MBytes/s] 1.0E-06*PMC1*64.0/time + +LONG +Formulas: +L2 bandwidth [MBytes/s] 1.0E-06*(DATA_CACHE_REFILLS_ALL+DATA_CACHE_EVICTED_ALL)*64/time +L2 data volume [GBytes] 1.0E-09*(DATA_CACHE_REFILLS_ALL+DATA_CACHE_EVICTED_ALL)*64 +L2 refill bandwidth [MBytes/s] 1.0E-06*DATA_CACHE_REFILLS_ALL*64/time +L2 evict [MBytes/s] 1.0E-06*DATA_CACHE_EVICTED_ALL*64/time +- +Profiling group to measure L2 cache bandwidth. The bandwidth is +computed by the number of cacheline loaded from L2 to L1 and the +number of modified cachelines evicted from the L1. +Note that this bandwidth also includes data transfers due to a +write allocate load on a store miss in L1 and copy back transfers if +originated from L2. + diff --git a/groups/kabini/MEM.txt b/groups/kabini/MEM.txt new file mode 100644 index 000000000..22aa19ef5 --- /dev/null +++ b/groups/kabini/MEM.txt @@ -0,0 +1,20 @@ +SHORT Main memory bandwidth in MBytes/s + +EVENTSET +UPMC0 UNC_DRAM_ACCESSES_DCT0_ALL +UPMC1 UNC_DRAM_ACCESSES_DCT1_ALL + +METRICS +Runtime (RDTSC) [s] time +Memory bandwidth [MBytes/s] 1.0E-06*(UPMC0+UPMC1)*64.0/time +Memory data volume [GBytes] 1.0E-09*(UPMC0+UPMC1)*64.0 + +LONG +Formulas: +Memory bandwidth [MBytes/s] = 1.0E-06*(DRAM_ACCESSES_DCTO_ALL+DRAM_ACCESSES_DCT1_ALL)*64/time +Memory data volume [GBytes] = 1.0E-09*(DRAM_ACCESSES_DCTO_ALL+DRAM_ACCESSES_DCT1_ALL)*64 +- +Profiling group to measure memory bandwidth drawn by all cores of a socket. +Note: As this group measures the accesses from all cores it only makes sense +to measure with one core per socket, similiar as with the Intel Nehalem Uncore events. + diff --git a/groups/kabini/NUMA.txt b/groups/kabini/NUMA.txt new file mode 100644 index 000000000..d94e735dd --- /dev/null +++ b/groups/kabini/NUMA.txt @@ -0,0 +1,28 @@ +SHORT Read/Write Events between the ccNUMA nodes + +EVENTSET +UPMC0 UNC_CPU_TO_DRAM_LOCAL_TO_0 +UPMC1 UNC_CPU_TO_DRAM_LOCAL_TO_1 +UPMC2 UNC_CPU_TO_DRAM_LOCAL_TO_2 +UPMC3 UNC_CPU_TO_DRAM_LOCAL_TO_3 + +METRICS +Runtime (RDTSC) [s] time +DRAM read/write local to 0 [MegaEvents/s] 1.0E-06*UPMC0/time +DRAM read/write local to 1 [MegaEvents/s] 1.0E-06*UPMC1/time +DRAM read/write local to 2 [MegaEvents/s] 1.0E-06*UPMC2/time +DRAM read/write local to 3 [MegaEvents/s] 1.0E-06*UPMC3/time + +LONG +Formulas: +DRAM read/write local to 0 [MegaEvents/s] 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_0/time +DRAM read/write local to 1 [MegaEvents/s] 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_1/time +DRAM read/write local to 2 [MegaEvents/s] 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_2/time +DRAM read/write local to 3 [MegaEvents/s] 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_3/time +- +Profiling group to measure the traffic from local CPU to the different +DRAM NUMA nodes. This group allows to detect NUMA problems in a threaded +code. You must first determine on which memory domains your code is running. +A code should only have significant traffic to its own memory domain. + + diff --git a/groups/kabini/NUMA2.txt b/groups/kabini/NUMA2.txt new file mode 100644 index 000000000..b10e6fba9 --- /dev/null +++ b/groups/kabini/NUMA2.txt @@ -0,0 +1,28 @@ +SHORT Read/Write Events between the ccNUMA nodes + +EVENTSET +UPMC0 UNC_CPU_TO_DRAM_LOCAL_TO_4 +UPMC1 UNC_CPU_TO_DRAM_LOCAL_TO_5 +UPMC2 UNC_CPU_TO_DRAM_LOCAL_TO_6 +UPMC3 UNC_CPU_TO_DRAM_LOCAL_TO_7 + +METRICS +Runtime (RDTSC) [s] time +DRAM read/write local to 4 [MegaEvents/s] 1.0E-06*UPMC0/time +DRAM read/write local to 5 [MegaEvents/s] 1.0E-06*UPMC1/time +DRAM read/write local to 6 [MegaEvents/s] 1.0E-06*UPMC2/time +DRAM read/write local to 7 [MegaEvents/s] 1.0E-06*UPMC3/time + +LONG +Formulas: +DRAM read/write local to 4 [MegaEvents/s] 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_4/time +DRAM read/write local to 5 [MegaEvents/s] 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_5/time +DRAM read/write local to 6 [MegaEvents/s] 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_6/time +DRAM read/write local to 7 [MegaEvents/s] 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_7/time +- +Profiling group to measure the traffic from local CPU to the different +DRAM NUMA nodes. This group allows to detect NUMA problems in a threaded +code. You must first determine on which memory domains your code is running. +A code should only have significant traffic to its own memory domain. + + diff --git a/groups/kabini/TLB.txt b/groups/kabini/TLB.txt new file mode 100644 index 000000000..4f170ee0c --- /dev/null +++ b/groups/kabini/TLB.txt @@ -0,0 +1,33 @@ +SHORT TLB miss rate/ratio + +EVENTSET +PMC0 RETIRED_INSTRUCTIONS +PMC1 DATA_CACHE_ACCESSES +PMC2 L2_DTLB_HIT_ALL +PMC3 DTLB_MISS_ALL + +METRICS +Runtime (RDTSC) [s] time +L1 DTLB request rate PMC1/PMC0 +L1 DTLB miss rate (PMC2+PMC3)/PMC0 +L1 DTLB miss ratio (PMC2+PMC3)/PMC1 +L2 DTLB request rate (PMC2+PMC3)/PMC0 +L2 DTLB miss rate PMC3/PMC0 +L2 DTLB miss ratio PMC3/(PMC2+PMC3) + + +LONG +Formulas: +L1 DTLB request rate DATA_CACHE_ACCESSES / RETIRED_INSTRUCTIONS +L1 DTLB miss rate (L2_DTLB_HIT_ALL+DTLB_MISS_ALL)/RETIRED_INSTRUCTIONS +L1 DTLB miss ratio (L2_DTLB_HIT_ALL+DTLB_MISS_ALL)/DATA_CACHE_ACCESSES +L2 DTLB request rate (L2_DTLB_HIT_ALL+DTLB_MISS_ALL)/RETIRED_INSTRUCTIONS +L2 DTLB miss rate DTLB_MISS_ALL / RETIRED_INSTRUCTIONS +L2 DTLB miss ratio DTLB_MISS_ALL / (L2_DTLB_HIT_ALL+DTLB_MISS_ALL) +- +L1 DTLB request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The DTLB miss rate gives a measure how often a TLB miss occured +per instruction. And finally L1 DTLB miss ratio tells you how many +of your memory references required caused a TLB miss in average. +NOTE: The L2 metrics are only relevant if L2 DTLB request rate is equal to the L1 DTLB miss rate! diff --git a/groups/nehalem/BRANCH.txt b/groups/nehalem/BRANCH.txt new file mode 100644 index 000000000..3d814167f --- /dev/null +++ b/groups/nehalem/BRANCH.txt @@ -0,0 +1,31 @@ +SHORT Branch prediction miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 BR_INST_RETIRED_ALL_BRANCHES +PMC1 BR_MISP_RETIRED_ALL_BRANCHES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Branch rate PMC0/FIXC0 +Branch misprediction rate PMC1/FIXC0 +Branch misprediction ratio PMC1/PMC0 +Instructions per branch FIXC0/PMC0 + +LONG +Formulas: +Branch rate = BR_INST_RETIRED_ALL_BRANCHES / INSTR_RETIRED_ANY +Branch misprediction rate = BR_MISP_RETIRED_ALL_BRANCHES / INSTR_RETIRED_ANY +Branch misprediction ratio = BR_MISP_RETIRED_ALL_BRANCHES / BR_INST_RETIRED_ALL_BRANCHES +Instructions per branch = INSTR_RETIRED_ANY / BR_INST_RETIRED_ALL_BRANCHES +- +The rates state how often in average a branch or a mispredicted branch occured +per instruction retired in total. The Branch misprediction ratio sets directly +into relation what ration of all branch instruction where mispredicted. +Instructions per branch is 1/Branch rate. + diff --git a/groups/nehalem/CACHE.txt b/groups/nehalem/CACHE.txt new file mode 100644 index 000000000..c3e989cff --- /dev/null +++ b/groups/nehalem/CACHE.txt @@ -0,0 +1,35 @@ +SHORT Data cache miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L1D_REPL +PMC1 L1D_ALL_REF_ANY + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Data cache misses PMC0 +Data cache request rate PMC1/FIXC0 +Data cache miss rate PMC0/FIXC0 +Data cache miss ratio PMC0/PMC1 + +LONG +Formulas: +Data cache request rate = L1D_ALL_REF_ANY / INSTR_RETIRED_ANY +Data cache miss rate = L1D_REPL / INSTR_RETIRED_ANY +Data cache miss ratio = L1D_REPL / L1D_ALL_REF_ANY +- +This group measures the locality of your data accesses with regard to the +L1 Cache. Data cache request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The Data cache miss rate gives a measure how often it was necessary to get +cachelines from higher levels of the memory hierarchy. And finally +Data cache miss ratio tells you how many of your memory references required +a cacheline to be loaded from a higher level. While the Data cache miss rate +might be given by your algorithm you should try to get Data cache miss ratio +as low as possible by increasing your cache reuse. + diff --git a/groups/nehalem/DATA.txt b/groups/nehalem/DATA.txt new file mode 100644 index 000000000..a5611bc19 --- /dev/null +++ b/groups/nehalem/DATA.txt @@ -0,0 +1,22 @@ +SHORT Load to store ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 MEM_INST_RETIRED_LOADS +PMC1 MEM_INST_RETIRED_STORES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Load to Store ratio PMC0/PMC1 + +LONG +Formulas: +Load to Store ratio = MEM_INST_RETIRED_LOADS / MEM_INST_RETIRED_STORES +- +This is a simple metric to determine your Load to store ratio. + diff --git a/groups/nehalem/FLOPS_DP.txt b/groups/nehalem/FLOPS_DP.txt new file mode 100644 index 000000000..c5ba91c69 --- /dev/null +++ b/groups/nehalem/FLOPS_DP.txt @@ -0,0 +1,31 @@ +SHORT Double Precision MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR +PMC2 FP_COMP_OPS_EXE_SSE_SINGLE_PRECISION +PMC3 FP_COMP_OPS_EXE_SSE_DOUBLE_PRECISION + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +DP MFlops/s 1.0E-06*(PMC0*2.0+PMC1)/time +Packed MUOPS/s 1.0E-06*PMC0/time +Scalar MUOPS/s 1.0E-06*PMC1/time +SP MUOPS/s 1.0E-06*PMC2/time +DP MUOPS/s 1.0E-06*PMC3/time + +LONG +Formula: +DP MFlops/s = (FP_COMP_OPS_EXE_SSE_FP_PACKED*2 + FP_COMP_OPS_EXE_SSE_FP_SCALAR)/ runtime +- +The Nehalem has not possibility to measure MFlops if mixed precision calculations are done. +Therefore both Single as well as Double precision are measured to ensure the correctness +of the measurements. You can check if your code was vectorized on the number of +FP_COMP_OPS_EXE_SSE_FP_PACKED versus the FP_COMP_OPS_EXE_SSE_FP_SCALAR. + diff --git a/groups/nehalem/FLOPS_SP.txt b/groups/nehalem/FLOPS_SP.txt new file mode 100644 index 000000000..4478c8f38 --- /dev/null +++ b/groups/nehalem/FLOPS_SP.txt @@ -0,0 +1,31 @@ +SHORT Single Precision MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR +PMC2 FP_COMP_OPS_EXE_SSE_SINGLE_PRECISION +PMC3 FP_COMP_OPS_EXE_SSE_DOUBLE_PRECISION + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +SP MFlops/s (SP assumed) 1.0E-06*(PMC0*4.0+PMC1)/time +Packed MUOPS/s 1.0E-06*PMC0/time +Scalar MUOPS/s 1.0E-06*PMC1/time +SP MUOPS/s 1.0E-06*PMC2/time +DP MUOPS/s 1.0E-06*PMC3/time + +LONG +Formula: +SP MFlops/s = (FP_COMP_OPS_EXE_SSE_FP_PACKED*4 + FP_COMP_OPS_EXE_SSE_FP_SCALAR)/ runtime +- +The Nehalem has not possibility to measure MFlops if mixed precision calculations are done. +Therefore both Single as well as Double precision are measured to ensure the correctness +of the measurements. You can check if your code was vectorized on the number of +FP_COMP_OPS_EXE_SSE_FP_PACKED versus the FP_COMP_OPS_EXE_SSE_FP_SCALAR. + diff --git a/groups/nehalem/FLOPS_X87.txt b/groups/nehalem/FLOPS_X87.txt new file mode 100644 index 000000000..6447b930e --- /dev/null +++ b/groups/nehalem/FLOPS_X87.txt @@ -0,0 +1,18 @@ +SHORT X87 MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 INST_RETIRED_X87 + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +X87 MFlops/s 1.0E-06*PMC0/time + +LONG +Profiling group to measure X87 flop rate. + diff --git a/groups/nehalem/L2.txt b/groups/nehalem/L2.txt new file mode 100644 index 000000000..d1930472f --- /dev/null +++ b/groups/nehalem/L2.txt @@ -0,0 +1,32 @@ +SHORT L2 cache bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L1D_REPL +PMC1 L1D_M_EVICT + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L2 Load [MBytes/s] 1.0E-06*PMC0*64.0/time +L2 Evict [MBytes/s] 1.0E-06*PMC1*64.0/time +L2 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L2 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 + +LONG +Formulas: +L2 Load [MBytes/s] = 1.0E-06*L1D_REPL*64/time +L2 Evict [MBytes/s] = 1.0E-06*L1D_M_EVICT*64/time +L2 bandwidth [MBytes/s] = 1.0E-06*(L1D_REPL+L1D_M_EVICT)*64/time +L2 data volume [GBytes] = 1.0E-09*(L1D_REPL+L1D_M_EVICT)*64 +- +Profiling group to measure L2 cache bandwidth. The bandwidth is +computed by the number of cacheline allocated in the L1 and the +number of modified cachelines evicted from the L1. +Note that this bandwidth also includes data transfers due to a +write allocate load on a store miss in L1. + diff --git a/groups/nehalem/L2CACHE.txt b/groups/nehalem/L2CACHE.txt new file mode 100644 index 000000000..0fd60da27 --- /dev/null +++ b/groups/nehalem/L2CACHE.txt @@ -0,0 +1,34 @@ +SHORT L2 cache miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L2_DATA_RQSTS_DEMAND_ANY +PMC1 L2_RQSTS_MISS + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L2 request rate PMC0/FIXC0 +L2 miss rate PMC1/FIXC0 +L2 miss ratio PMC1/PMC0 + +LONG +Formulas: +L2 request rate = L2_DATA_RQSTS_DEMAND_ANY / INSTR_RETIRED_ANY +L2 miss rate = L2_RQSTS_MISS / INSTR_RETIRED_ANY +L2 miss ratio = L2_RQSTS_MISS / L2_DATA_RQSTS_DEMAND_MESI +- +This group measures the locality of your data accesses with regard to the +L2 Cache. L2 request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The L2 miss rate gives a measure how often it was necessary to get +cachelines from memory. And finally L2 miss ratio tells you how many of your +memory references required a cacheline to be loaded from a higher level. +While the Data cache miss rate might be given by your algorithm you should +try to get Data cache miss ratio as low as possible by increasing your cache reuse. + + diff --git a/groups/nehalem/L3.txt b/groups/nehalem/L3.txt new file mode 100644 index 000000000..446afee89 --- /dev/null +++ b/groups/nehalem/L3.txt @@ -0,0 +1,32 @@ +SHORT L3 cache bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L2_LINES_IN_ANY +PMC1 L2_LINES_OUT_DEMAND_DIRTY + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L3 Load [MBytes/s] 1.0E-06*PMC0*64.0/time +L3 Evict [MBytes/s] 1.0E-06*PMC1*64.0/time +L3 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L3 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 + +LONG +Formulas: +L3 Load [MBytes/s] 1.0E-06*L2_LINES_IN_ANY*64/time +L3 Evict [MBytes/s] 1.0E-06*L2_LINES_OUT_DEMAND_DIRTY*64/time +L3 bandwidth [MBytes/s] 1.0E-06*(L2_LINES_IN_ANY+L2_LINES_OUT_DEMAND_DIRTY)*64/time +L3 data volume [GBytes] 1.0E-09*(L2_LINES_IN_ANY+L2_LINES_OUT_DEMAND_DIRTY)*64 +- +Profiling group to measure L3 cache bandwidth. The bandwidth is computed by the +number of cacheline allocated in the L2 and the number of modified cachelines +evicted from the L2. Also reports total data volume between L3 and L2 caches. +Note that this bandwidth also includes data transfers due to a write allocate +load on a store miss in L2. + diff --git a/groups/nehalem/L3CACHE.txt b/groups/nehalem/L3CACHE.txt new file mode 100644 index 000000000..b6ec110f7 --- /dev/null +++ b/groups/nehalem/L3CACHE.txt @@ -0,0 +1,36 @@ +SHORT L3 cache miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +UPMC0 UNC_L3_HITS_ANY +UPMC1 UNC_L3_MISS_ANY +UPMC2 UNC_L3_LINES_IN_ANY +UPMC3 UNC_L3_LINES_OUT_ANY + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L3 request rate UPMC0/FIXC0 +L3 miss rate UPMC1/FIXC0 +L3 miss ratio UPMC1/UPMC0 + +LONG +Formulas: +L3 request rate UNC_L3_HITS_ANY / INSTR_RETIRED_ANY +L3 miss rate UNC_L3_MISS_ANY / INSTR_RETIRED_ANY +L3 miss ratio UNC_L3_MISS_ANY / UNC_L3_HITS_ANY +- +This group measures the locality of your data accesses with regard to the +L3 Cache. L3 request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The L3 miss rate gives a measure how often it was necessary to get +cachelines from memory. And finally L3 miss ratio tells you how many of your +memory references required a cacheline to be loaded from a higher level. +While the Data cache miss rate might be given by your algorithm you should +try to get Data cache miss ratio as low as possible by increasing your cache reuse. + + diff --git a/groups/nehalem/MEM.txt b/groups/nehalem/MEM.txt new file mode 100644 index 000000000..087b269b0 --- /dev/null +++ b/groups/nehalem/MEM.txt @@ -0,0 +1,36 @@ +SHORT Main memory bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +UPMC0 UNC_QMC_NORMAL_READS_ANY +UPMC1 UNC_QMC_WRITES_FULL_ANY +UPMC2 UNC_QHL_REQUESTS_REMOTE_READS +UPMC3 UNC_QHL_REQUESTS_LOCAL_READS +UPMC4 UNC_QHL_REQUESTS_REMOTE_WRITES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Memory bandwidth [MBytes/s] 1.0E-06*(UPMC0+UPMC1)*64/time +Memory data volume [GBytes] 1.0E-09*(UPMC0+UPMC1)*64 +Remote Read BW [MBytes/s] 1.0E-06*(UPMC2)*64/time +Remote Write BW [MBytes/s] 1.0E-06*(UPMC4)*64/time +Remote BW [MBytes/s] 1.0E-06*(UPMC2+UPMC4)*64/time + +LONG +Formulas: +Memory bandwidth [MBytes/s] = 1.0E-06*(UNC_QMC_NORMAL_READS_ANY+UNC_QMC_WRITES_FULL_ANY)*64/time +Memory data volume [GBytes] = 1.0E-09*(UNC_QMC_NORMAL_READS_ANY+UNC_QMC_WRITES_FULL_ANY)*64 +Remote Read BW [MBytes/s] = 1.0E-06*(UNC_QHL_REQUESTS_REMOTE_READS)*64/time; +Remote Write BW [MBytes/s] 1.0E-06*(UNC_QHL_REQUESTS_REMOTE_WRITES)*64/time +Remote BW [MBytes/s] 1.0E-06*(UNC_QHL_REQUESTS_REMOTE_READS+UNC_QHL_REQUESTS_REMOTE_WRITES)*64/time +- +Profiling group to measure memory bandwidth drawn by all cores of a socket. +This group will be measured by one core per socket. The Remote Read BW tells +you if cachelines are transfered between sockets, meaning that cores access +data owned by a remote NUMA domain. + diff --git a/groups/nehalem/SCHEDULER.txt b/groups/nehalem/SCHEDULER.txt new file mode 100644 index 000000000..a7bbe37fc --- /dev/null +++ b/groups/nehalem/SCHEDULER.txt @@ -0,0 +1,21 @@ +SHORT Scheduler Ports + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 UOPS_EXECUTED_PORT0 +PMC1 UOPS_EXECUTED_PORT1 +PMC2 UOPS_EXECUTED_PORT5 + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +RATIO Port 1 PMC1/PMC0 +RATIO Port 5 PMC2/PMC0 + +LONG +Measures how many instructions were scheduled on which issue port. + diff --git a/groups/nehalem/TLB.txt b/groups/nehalem/TLB.txt new file mode 100644 index 000000000..5f93d6648 --- /dev/null +++ b/groups/nehalem/TLB.txt @@ -0,0 +1,30 @@ +SHORT TLB miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 DTLB_MISSES_ANY +PMC1 L1D_ALL_REF_ANY + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L1 DTLB request rate PMC1/FIXC0 +L1 DTLB miss rate PMC0/FIXC0 +L1 DTLB miss ratio PMC0/PMC1 + +LONG +Formulas: +L1 DTLB request rate = L1D_ALL_REF_ANY / INSTR_RETIRED_ANY +DTLB miss rate = DTLB_MISSES_ANY / INSTR_RETIRED_ANY +L1 DTLB miss ratio = DTLB_MISSES_ANY / L1D_ALL_REF_ANY +- +L1 DTLB request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The DTLB miss rate gives a measure how often a TLB miss occured +per instruction. And finally L1 DTLB miss ratio tells you how many +of your memory references required caused a TLB miss in average. + diff --git a/groups/nehalemEX/BRANCH.txt b/groups/nehalemEX/BRANCH.txt new file mode 100644 index 000000000..3d814167f --- /dev/null +++ b/groups/nehalemEX/BRANCH.txt @@ -0,0 +1,31 @@ +SHORT Branch prediction miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 BR_INST_RETIRED_ALL_BRANCHES +PMC1 BR_MISP_RETIRED_ALL_BRANCHES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Branch rate PMC0/FIXC0 +Branch misprediction rate PMC1/FIXC0 +Branch misprediction ratio PMC1/PMC0 +Instructions per branch FIXC0/PMC0 + +LONG +Formulas: +Branch rate = BR_INST_RETIRED_ALL_BRANCHES / INSTR_RETIRED_ANY +Branch misprediction rate = BR_MISP_RETIRED_ALL_BRANCHES / INSTR_RETIRED_ANY +Branch misprediction ratio = BR_MISP_RETIRED_ALL_BRANCHES / BR_INST_RETIRED_ALL_BRANCHES +Instructions per branch = INSTR_RETIRED_ANY / BR_INST_RETIRED_ALL_BRANCHES +- +The rates state how often in average a branch or a mispredicted branch occured +per instruction retired in total. The Branch misprediction ratio sets directly +into relation what ration of all branch instruction where mispredicted. +Instructions per branch is 1/Branch rate. + diff --git a/groups/nehalemEX/CACHE.txt b/groups/nehalemEX/CACHE.txt new file mode 100644 index 000000000..c3e989cff --- /dev/null +++ b/groups/nehalemEX/CACHE.txt @@ -0,0 +1,35 @@ +SHORT Data cache miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L1D_REPL +PMC1 L1D_ALL_REF_ANY + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Data cache misses PMC0 +Data cache request rate PMC1/FIXC0 +Data cache miss rate PMC0/FIXC0 +Data cache miss ratio PMC0/PMC1 + +LONG +Formulas: +Data cache request rate = L1D_ALL_REF_ANY / INSTR_RETIRED_ANY +Data cache miss rate = L1D_REPL / INSTR_RETIRED_ANY +Data cache miss ratio = L1D_REPL / L1D_ALL_REF_ANY +- +This group measures the locality of your data accesses with regard to the +L1 Cache. Data cache request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The Data cache miss rate gives a measure how often it was necessary to get +cachelines from higher levels of the memory hierarchy. And finally +Data cache miss ratio tells you how many of your memory references required +a cacheline to be loaded from a higher level. While the Data cache miss rate +might be given by your algorithm you should try to get Data cache miss ratio +as low as possible by increasing your cache reuse. + diff --git a/groups/nehalemEX/DATA.txt b/groups/nehalemEX/DATA.txt new file mode 100644 index 000000000..a5611bc19 --- /dev/null +++ b/groups/nehalemEX/DATA.txt @@ -0,0 +1,22 @@ +SHORT Load to store ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 MEM_INST_RETIRED_LOADS +PMC1 MEM_INST_RETIRED_STORES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Load to Store ratio PMC0/PMC1 + +LONG +Formulas: +Load to Store ratio = MEM_INST_RETIRED_LOADS / MEM_INST_RETIRED_STORES +- +This is a simple metric to determine your Load to store ratio. + diff --git a/groups/nehalemEX/FLOPS_DP.txt b/groups/nehalemEX/FLOPS_DP.txt new file mode 100644 index 000000000..c5ba91c69 --- /dev/null +++ b/groups/nehalemEX/FLOPS_DP.txt @@ -0,0 +1,31 @@ +SHORT Double Precision MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR +PMC2 FP_COMP_OPS_EXE_SSE_SINGLE_PRECISION +PMC3 FP_COMP_OPS_EXE_SSE_DOUBLE_PRECISION + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +DP MFlops/s 1.0E-06*(PMC0*2.0+PMC1)/time +Packed MUOPS/s 1.0E-06*PMC0/time +Scalar MUOPS/s 1.0E-06*PMC1/time +SP MUOPS/s 1.0E-06*PMC2/time +DP MUOPS/s 1.0E-06*PMC3/time + +LONG +Formula: +DP MFlops/s = (FP_COMP_OPS_EXE_SSE_FP_PACKED*2 + FP_COMP_OPS_EXE_SSE_FP_SCALAR)/ runtime +- +The Nehalem has not possibility to measure MFlops if mixed precision calculations are done. +Therefore both Single as well as Double precision are measured to ensure the correctness +of the measurements. You can check if your code was vectorized on the number of +FP_COMP_OPS_EXE_SSE_FP_PACKED versus the FP_COMP_OPS_EXE_SSE_FP_SCALAR. + diff --git a/groups/nehalemEX/FLOPS_SP.txt b/groups/nehalemEX/FLOPS_SP.txt new file mode 100644 index 000000000..4478c8f38 --- /dev/null +++ b/groups/nehalemEX/FLOPS_SP.txt @@ -0,0 +1,31 @@ +SHORT Single Precision MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR +PMC2 FP_COMP_OPS_EXE_SSE_SINGLE_PRECISION +PMC3 FP_COMP_OPS_EXE_SSE_DOUBLE_PRECISION + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +SP MFlops/s (SP assumed) 1.0E-06*(PMC0*4.0+PMC1)/time +Packed MUOPS/s 1.0E-06*PMC0/time +Scalar MUOPS/s 1.0E-06*PMC1/time +SP MUOPS/s 1.0E-06*PMC2/time +DP MUOPS/s 1.0E-06*PMC3/time + +LONG +Formula: +SP MFlops/s = (FP_COMP_OPS_EXE_SSE_FP_PACKED*4 + FP_COMP_OPS_EXE_SSE_FP_SCALAR)/ runtime +- +The Nehalem has not possibility to measure MFlops if mixed precision calculations are done. +Therefore both Single as well as Double precision are measured to ensure the correctness +of the measurements. You can check if your code was vectorized on the number of +FP_COMP_OPS_EXE_SSE_FP_PACKED versus the FP_COMP_OPS_EXE_SSE_FP_SCALAR. + diff --git a/groups/nehalemEX/FLOPS_X87.txt b/groups/nehalemEX/FLOPS_X87.txt new file mode 100644 index 000000000..6447b930e --- /dev/null +++ b/groups/nehalemEX/FLOPS_X87.txt @@ -0,0 +1,18 @@ +SHORT X87 MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 INST_RETIRED_X87 + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +X87 MFlops/s 1.0E-06*PMC0/time + +LONG +Profiling group to measure X87 flop rate. + diff --git a/groups/nehalemEX/L2.txt b/groups/nehalemEX/L2.txt new file mode 100644 index 000000000..2734c5d07 --- /dev/null +++ b/groups/nehalemEX/L2.txt @@ -0,0 +1,33 @@ +SHORT L2 cache bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L1D_REPL +PMC1 L1D_M_EVICT + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L2 Load [MBytes/s] 1.0E-06*PMC0*64.0/time +L2 Evict [MBytes/s] 1.0E-06*PMC1*64.0/time +L2 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L2 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 + +LONG +Formulas: +L2 Load [MBytes/s] = 1.0E-06*L1D_REPL*64/time +L2 Evict [MBytes/s] = 1.0E-06*L1D_M_EVICT*64/time +L2 bandwidth [MBytes/s] = 1.0E-06*(L1D_REPL+L1D_M_EVICT)*64/time +L2 data volume [GBytes] = 1.0E-09*(L1D_REPL+L1D_M_EVICT)*64 +- +Profiling group to measure L2 cache bandwidth. The bandwidth is +computed by the number of cacheline allocated in the L1 and the +number of modified cachelines evicted from the L1. Also reports on +total data volume transfered between L2 and L1 cache. +Note that this bandwidth also includes data transfers due to a +write allocate load on a store miss in L1. + diff --git a/groups/nehalemEX/L2CACHE.txt b/groups/nehalemEX/L2CACHE.txt new file mode 100644 index 000000000..49778be04 --- /dev/null +++ b/groups/nehalemEX/L2CACHE.txt @@ -0,0 +1,35 @@ +SHORT L2 cache miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L2_DATA_RQSTS_DEMAND_ANY +PMC1 L2_RQSTS_MISS + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L2 request rate PMC0/FIXC0 +L2 miss rate PMC1/FIXC0 +L2 miss ratio PMC1/PMC0 + +LONG +Formulas: +L2 request rate = L2_DATA_RQSTS_DEMAND_ANY / INSTR_RETIRED_ANY +L2 miss rate = L2_RQSTS_MISS / INSTR_RETIRED_ANY +L2 miss ratio = L2_RQSTS_MISS / L2_DATA_RQSTS_DEMAND_ANY +- +This group measures the locality of your data accesses with regard to the +L2 Cache. L2 request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The L2 miss rate gives a measure how often it was necessary to get +cachelines from memory. And finally L2 miss ratio tells you how many of your +memory references required a cacheline to be loaded from a higher level. +While the Data cache miss rate might be given by your algorithm you should +try to get Data cache miss ratio as low as possible by increasing your cache reuse. +Note: This group might need to be revised! + + diff --git a/groups/nehalemEX/MEM.txt b/groups/nehalemEX/MEM.txt new file mode 100644 index 000000000..86a2e97bf --- /dev/null +++ b/groups/nehalemEX/MEM.txt @@ -0,0 +1,39 @@ +SHORT Main memory bandwidth + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +WBOX4 UNCORE_CYCLES +MBOX0C0 FVC_EV0_BBOX_CMDS_READS +MBOX0C1 FVC_EV0_BBOX_RSP_ACK +MBOX1C0 FVC_EV0_BBOX_CMDS_READS +MBOX1C1 FVC_EV0_BBOX_RSP_ACK +BBOX0C1 IMT_INSERTS_WR +BBOX1C1 IMT_INSERTS_WR +RBOX0C0 NEW_PACKETS_RECV_PORT0_IPERF0_ANY_DRS +RBOX0C1 NEW_PACKETS_RECV_PORT1_IPERF0_ANY_DRS +RBOX1C0 NEW_PACKETS_RECV_PORT4_IPERF0_ANY_DRS +RBOX1C1 NEW_PACKETS_RECV_PORT5_IPERF0_ANY_DRS + + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +Uncore Clock [MHz] 1.E-06*(WBOX4)/time +CPI FIXC1/FIXC0 +Memory Read BW [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0)*64/time +Memory Write BW [MBytes/s] 1.0E-06*(BBOX0C1+BBOX1C1)*64/time +Memory bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+BBOX0C1+BBOX1C1)*64/time +Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+BBOX0C1+BBOX1C1)*64 +Remote write data traffic Port 0 [MBytes/s] 1.0E-06*(RBOX0C0)*64/time +Remote write data traffic Port 1 [MBytes/s] 1.0E-06*(RBOX0C1)*64/time +Remote write data traffic Port 4 [MBytes/s] 1.0E-06*(RBOX1C0)*64/time +Remote write data traffic Port 5 [MBytes/s] 1.0E-06*(RBOX1C1)*64/time + +LONG +Profiling group to measure memory bandwidth drawn by all cores of a socket. +Addional to the bandwidth it also outputs the data volume and the remote +traffic over QPI links to other sockets. + diff --git a/groups/nehalemEX/SCHEDULER.txt b/groups/nehalemEX/SCHEDULER.txt new file mode 100644 index 000000000..a7bbe37fc --- /dev/null +++ b/groups/nehalemEX/SCHEDULER.txt @@ -0,0 +1,21 @@ +SHORT Scheduler Ports + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 UOPS_EXECUTED_PORT0 +PMC1 UOPS_EXECUTED_PORT1 +PMC2 UOPS_EXECUTED_PORT5 + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +RATIO Port 1 PMC1/PMC0 +RATIO Port 5 PMC2/PMC0 + +LONG +Measures how many instructions were scheduled on which issue port. + diff --git a/groups/nehalemEX/TLB.txt b/groups/nehalemEX/TLB.txt new file mode 100644 index 000000000..5f93d6648 --- /dev/null +++ b/groups/nehalemEX/TLB.txt @@ -0,0 +1,30 @@ +SHORT TLB miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 DTLB_MISSES_ANY +PMC1 L1D_ALL_REF_ANY + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L1 DTLB request rate PMC1/FIXC0 +L1 DTLB miss rate PMC0/FIXC0 +L1 DTLB miss ratio PMC0/PMC1 + +LONG +Formulas: +L1 DTLB request rate = L1D_ALL_REF_ANY / INSTR_RETIRED_ANY +DTLB miss rate = DTLB_MISSES_ANY / INSTR_RETIRED_ANY +L1 DTLB miss ratio = DTLB_MISSES_ANY / L1D_ALL_REF_ANY +- +L1 DTLB request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The DTLB miss rate gives a measure how often a TLB miss occured +per instruction. And finally L1 DTLB miss ratio tells you how many +of your memory references required caused a TLB miss in average. + diff --git a/groups/phi/CACHE.txt b/groups/phi/CACHE.txt new file mode 100644 index 000000000..d61196576 --- /dev/null +++ b/groups/phi/CACHE.txt @@ -0,0 +1,19 @@ +SHORT Compute to Data Access Ratio + +EVENTSET +PMC0 VPU_ELEMENTS_ACTIVE +PMC1 DATA_READ_OR_WRITE + +METRICS +Runtime (RDTSC) [s] time +L1 compute intensity PMC0/PMC1 + +LONG +These metric is a way to measure the computational density of an +application, or how many computations it is performing on average for each +piece of data loaded. L1 Compute to Data Access Ratio, should be +used to judge suitability of an application for running on the Intel MIC +Architecture. Applications that will perform well on the Intel® MIC +Architecture should be vectorized, and ideally be able to perform multiple +operations on the same pieces of data (or same cachelines). + diff --git a/groups/phi/CPI.txt b/groups/phi/CPI.txt new file mode 100644 index 000000000..8d4cf36bd --- /dev/null +++ b/groups/phi/CPI.txt @@ -0,0 +1,19 @@ +SHORT Cycles per instruction + +EVENTSET +PMC0 INSTRUCTIONS_EXECUTED +PMC1 CPU_CLK_UNHALTED + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] PMC1*inverseClock +CPI PMC1/PMC0 +IPC PMC0/PMC1 + +LONG +This group measures how efficient the processor works with +regard to instruction throughput. Also important as a standalone +metric is INSTRUCTIONS_RETIRED as it tells you how many instruction +you need to execute for a task. An optimization might show very +low CPI values but execute many more instruction for it. + diff --git a/groups/phi/L2CACHE.txt b/groups/phi/L2CACHE.txt new file mode 100644 index 000000000..228a5bafa --- /dev/null +++ b/groups/phi/L2CACHE.txt @@ -0,0 +1,19 @@ +SHORT L2 Compute to Data Access Ratio + +EVENTSET +PMC0 VPU_ELEMENTS_ACTIVE +PMC1 DATA_READ_MISS_OR_WRITE_MISS + +METRICS +Runtime (RDTSC) [s] time +L2 compute intensity PMC0/PMC1 + +LONG +These metric is a way to measure the computational density of an +application, or how many computations it is performing on average for each +piece of data loaded. L2 Compute to Data Access Ratio, should be +used to judge suitability of an application for running on the Intel MIC +Architecture. Applications that will perform well on the Intel® MIC +Architecture should be vectorized, and ideally be able to perform multiple +operations on the same pieces of data (or same cachelines). + diff --git a/groups/phi/MEM1.txt b/groups/phi/MEM1.txt new file mode 100644 index 000000000..16e44e038 --- /dev/null +++ b/groups/phi/MEM1.txt @@ -0,0 +1,13 @@ +SHORT L2 Write Misses + +EVENTSET +PMC0 L2_DATA_WRITE_MISS_MEM_FILL + +METRICS +Runtime (RDTSC) [s] time +RFO Data Bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time +RFO Volume [GBytes] 1.0E-09*PMC0*64.0 + +LONG +Bla + diff --git a/groups/phi/MEM2.txt b/groups/phi/MEM2.txt new file mode 100644 index 000000000..9be1f2a82 --- /dev/null +++ b/groups/phi/MEM2.txt @@ -0,0 +1,13 @@ +SHORT L2 Read Misses + +EVENTSET +PMC0 L2_DATA_READ_MISS_MEM_FILL + +METRICS +Runtime (RDTSC) [s] time +Read Data Bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time +Read Data Volume [GBytes] 1.0E-09*PMC0*64.0 + +LONG +Bla + diff --git a/groups/phi/MEM3.txt b/groups/phi/MEM3.txt new file mode 100644 index 000000000..45ce0de0c --- /dev/null +++ b/groups/phi/MEM3.txt @@ -0,0 +1,13 @@ +SHORT HW prefetch transfers + +EVENTSET +PMC0 HWP_L2MISS + +METRICS +Runtime (RDTSC) [s] time +Prefetch Data Bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time +Prefetch Data Volume [GBytes] 1.0E-09*PMC0*64.0 + +LONG +Bla + diff --git a/groups/phi/MEM4.txt b/groups/phi/MEM4.txt new file mode 100644 index 000000000..0c24762a9 --- /dev/null +++ b/groups/phi/MEM4.txt @@ -0,0 +1,13 @@ +SHORT L2 Victom requests + +EVENTSET +PMC0 L2_VICTIM_REQ_WITH_DATA + +METRICS +Runtime (RDTSC) [s] time +Victim Data Bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time +Victim Data Volume [GBytes] 1.0E-09*PMC0*64.0 + +LONG +Bla + diff --git a/groups/phi/MEM5.txt b/groups/phi/MEM5.txt new file mode 100644 index 000000000..ade982833 --- /dev/null +++ b/groups/phi/MEM5.txt @@ -0,0 +1,13 @@ +SHORT L2 Snoop hits + +EVENTSET +PMC0 SNP_HITM_L2 + +METRICS +Runtime (RDTSC) [s] time +Snoop Data Bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time +Snoop Data Volume [GBytes] 1.0E-09*PMC0*64.0 + +LONG +Bla + diff --git a/groups/phi/MEM6.txt b/groups/phi/MEM6.txt new file mode 100644 index 000000000..41be52e5b --- /dev/null +++ b/groups/phi/MEM6.txt @@ -0,0 +1,13 @@ +SHORT L2 Read Misses + +EVENTSET +PMC0 L2_READ_MISS + +METRICS +Runtime (RDTSC) [s] time +L2 Read Data Bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time +L2 Read Data Volume [GBytes] 1.0E-09*PMC0*64.0 + +LONG +Bla + diff --git a/groups/phi/PAIRING.txt b/groups/phi/PAIRING.txt new file mode 100644 index 000000000..2e93cc84e --- /dev/null +++ b/groups/phi/PAIRING.txt @@ -0,0 +1,13 @@ +SHORT Pairing ratio + +EVENTSET +PMC0 INSTRUCTIONS_EXECUTED +PMC1 INSTRUCTIONS_EXECUTED_V_PIPE + +METRICS +Runtime (RDTSC) [s] time +VPipeRatio PMC1/PMC0 +PairingRatio PMC1/(PMC0-PMC1) + +LONG +Pairing ratio diff --git a/groups/phi/READ_MISS_RATIO.txt b/groups/phi/READ_MISS_RATIO.txt new file mode 100644 index 000000000..c98f91b5f --- /dev/null +++ b/groups/phi/READ_MISS_RATIO.txt @@ -0,0 +1,12 @@ +SHORT Miss ratio for data read + +EVENTSET +PMC0 DATA_READ +PMC1 DATA_READ_MISS + +METRICS +Runtime (RDTSC) [s] time +Miss ratio PMC1/PMC0 + +LONG +Miss ratio for data read diff --git a/groups/phi/VECTOR.txt b/groups/phi/VECTOR.txt new file mode 100644 index 000000000..1e91bc46d --- /dev/null +++ b/groups/phi/VECTOR.txt @@ -0,0 +1,15 @@ +SHORT Vector unit usage + +EVENTSET +PMC0 VPU_INSTRUCTIONS_EXECUTED +PMC1 VPU_ELEMENTS_ACTIVE + +METRICS +Runtime (RDTSC) [s] time +Vectorization Intensity PMC1/PMC0 + +LONG +Vector instructions include instructions that perform floating-point +operations, instructions that load vector registers from memory and store them +to memory, instructions to manipulate vector mask registers, and other special +purpose instructions such as vector shuffle. diff --git a/groups/phi/VECTOR2.txt b/groups/phi/VECTOR2.txt new file mode 100644 index 000000000..487460cc6 --- /dev/null +++ b/groups/phi/VECTOR2.txt @@ -0,0 +1,17 @@ +SHORT Vector unit usage + +EVENTSET +PMC0 VPU_INSTRUCTIONS_EXECUTED +PMC1 VPU_STALL_REG + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] PMC1*inverseClock + +LONG +This group measures how efficient the processor works with +regard to instruction throughput. Also important as a standalone +metric is INSTRUCTIONS_RETIRED as it tells you how many instruction +you need to execute for a task. An optimization might show very +low CPI values but execute many more instruction for it. + diff --git a/groups/phi/VPU_FILL_RATIO_DBL.txt b/groups/phi/VPU_FILL_RATIO_DBL.txt new file mode 100644 index 000000000..50d38356b --- /dev/null +++ b/groups/phi/VPU_FILL_RATIO_DBL.txt @@ -0,0 +1,12 @@ +SHORT VPU filling for Double + +EVENTSET +PMC0 VPU_INSTRUCTIONS_EXECUTED +PMC1 VPU_ELEMENTS_ACTIVE + +METRICS +Runtime (RDTSC) [s] time +VPUFillRatio PMC0*8/PMC1 + +LONG +VPU filling for Double diff --git a/groups/phi/VPU_PAIRING.txt b/groups/phi/VPU_PAIRING.txt new file mode 100644 index 000000000..998c1d7dd --- /dev/null +++ b/groups/phi/VPU_PAIRING.txt @@ -0,0 +1,13 @@ +SHORT VPU Pairing ratio + +EVENTSET +PMC0 VPU_INSTRUCTIONS_EXECUTED +PMC1 VPU_INSTRUCTIONS_EXECUTED_V_PIPE + +METRICS +Runtime (RDTSC) [s] time +VPipeRatio PMC1/PMC0 +PairingRatio PMC1/(PMC0-PMC1) + +LONG +VPU Pairing ratio diff --git a/groups/phi/VPU_READ_MISS_RATIO.txt b/groups/phi/VPU_READ_MISS_RATIO.txt new file mode 100644 index 000000000..94ec96351 --- /dev/null +++ b/groups/phi/VPU_READ_MISS_RATIO.txt @@ -0,0 +1,12 @@ +SHORT Miss ratio for VPU data read + +EVENTSET +PMC0 VPU_DATA_READ +PMC1 VPU_DATA_READ_MISS + +METRICS +Runtime (RDTSC) [s] time +Miss ratio PMC1/PMC0 + +LONG +Miss ratio for VPU data read diff --git a/groups/phi/VPU_WRITE_MISS_RATIO.txt b/groups/phi/VPU_WRITE_MISS_RATIO.txt new file mode 100644 index 000000000..429ee6d3e --- /dev/null +++ b/groups/phi/VPU_WRITE_MISS_RATIO.txt @@ -0,0 +1,12 @@ +SHORT Miss ratio for VPU data write + +EVENTSET +PMC0 VPU_DATA_WRITE +PMC1 VPU_DATA_WRITE_MISS + +METRICS +Runtime (RDTSC) [s] time +Miss ratio PMC1/PMC0 + +LONG +Miss ratio for VPU data write diff --git a/groups/phi/WRITE_MISS_RATIO.txt b/groups/phi/WRITE_MISS_RATIO.txt new file mode 100644 index 000000000..0544b0eef --- /dev/null +++ b/groups/phi/WRITE_MISS_RATIO.txt @@ -0,0 +1,12 @@ +SHORT Miss ratio for data write + +EVENTSET +PMC0 DATA_WRITE +PMC1 DATA_WRITE_MISS + +METRICS +Runtime (RDTSC) [s] time +Miss ratio PMC1/PMC0 + +LONG +Miss ratio for data write diff --git a/groups/sandybridge/BRANCH.txt b/groups/sandybridge/BRANCH.txt new file mode 100644 index 000000000..cbaf83451 --- /dev/null +++ b/groups/sandybridge/BRANCH.txt @@ -0,0 +1,31 @@ +SHORT Branch prediction miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 BR_INST_RETIRED_ALL_BRANCHES +PMC1 BR_MISP_RETIRED_ALL_BRANCHES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Branch rate PMC0/FIXC0 +Branch misprediction rate PMC1/FIXC0 +Branch misprediction ratio PMC1/PMC0 +Instructions per branch FIXC0/PMC0 + +LONG +Formulas: +Branch rate = BR_INST_RETIRED_ALL_BRANCHES / INSTR_RETIRED_ANY +Branch misprediction rate = BR_MISP_RETIRED_ALL_BRANCHES / INSTR_RETIRED_ANY +Branch misprediction ratio = BR_MISP_RETIRED_ALL_BRANCHES / BR_INST_RETIRED_ALL_BRANCHES +Instructions per branch = INSTR_RETIRED_ANY / BR_INST_RETIRED_ALL_BRANCHES +- +The rates state how often in average a branch or a mispredicted branch occured +per instruction retired in total. The Branch misprediction ratio sets directly +into relation what ratio of all branch instruction where mispredicted. +Instructions per branch is 1/Branch rate. + diff --git a/groups/sandybridge/DATA.txt b/groups/sandybridge/DATA.txt new file mode 100644 index 000000000..5f04a23a8 --- /dev/null +++ b/groups/sandybridge/DATA.txt @@ -0,0 +1,22 @@ +SHORT Load to store ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 MEM_UOP_RETIRED_LOADS +PMC1 MEM_UOP_RETIRED_STORES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Load to Store ratio PMC0/PMC1 + +LONG +Formulas: +Load to Store ratio = MEM_UOP_RETIRED_LOADS / MEM_UOP_RETIRED_STORES +- +This is a metric to determine your load to store ratio. + diff --git a/groups/sandybridge/ENERGY.txt b/groups/sandybridge/ENERGY.txt new file mode 100644 index 000000000..33a8cf419 --- /dev/null +++ b/groups/sandybridge/ENERGY.txt @@ -0,0 +1,25 @@ +SHORT Power and Energy consumption + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PWR0 PWR_PKG_ENERGY +PWR3 PWR_DRAM_ENERGY + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Energy [J] PWR0 +Energy DRAM [J] PWR3 +Power [W] PWR0/time + +LONG +Formula: +Power = PWR_PKG_ENERGY / time +- +SandyBridge implements the new RAPL interface. This interface enables to +monitor the consumed energy on the package (socket) level. + diff --git a/groups/sandybridge/FLOPS_AVX.txt b/groups/sandybridge/FLOPS_AVX.txt new file mode 100644 index 000000000..a30abb245 --- /dev/null +++ b/groups/sandybridge/FLOPS_AVX.txt @@ -0,0 +1,25 @@ +SHORT Packed AVX MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 FP_256_PACKED_SINGLE +PMC1 FP_256_PACKED_DOUBLE + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +32b packed SP MFlops/s 1.0E-06*(PMC0*8.0)/time +32b packed DP MFlops/s 1.0E-06*(PMC1*4.0)/time + +LONG +Formula: +32b packed SP MFlops/s = (FP_256_PACKED_SINGLE*8)/ runtime +32b packed DP MFlops/s = (FP_256_PACKED_DOUBLE*4)/ runtime +- +Packed 32b AVX flops rates. Please note that the current flop measurements on SandyBridge are +potentially wrong. So you cannot trust these counters at the moment! + diff --git a/groups/sandybridge/FLOPS_DP.txt b/groups/sandybridge/FLOPS_DP.txt new file mode 100644 index 000000000..47fe80596 --- /dev/null +++ b/groups/sandybridge/FLOPS_DP.txt @@ -0,0 +1,29 @@ +SHORT Double Precision MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED_DOUBLE +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR_DOUBLE +PMC2 FP_256_PACKED_DOUBLE + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +MFlops/s 1.0E-06*(PMC0*2.0+PMC1)/time +32b AVX MFlops/s 1.0E-06*(PMC2*4.0)/time +Packed MUOPS/s 1.0E-06*(PMC0+PMC2)/time +Scalar MUOPS/s 1.0E-06*PMC1/time + +LONG +Formula: +MFlops/s = (FP_COMP_OPS_EXE_SSE_FP_PACKED*2 + FP_COMP_OPS_EXE_SSE_FP_SCALAR)/ runtime +AVX MFlops/s = (FP_256_PACKED_DOUBLE*4)/ runtime +- +SSE scalar and packed double precision flop rates. Please note that the current +flop measurements on IvyBridge are potentially wrong. So you cannot trust +these counters at the moment! + diff --git a/groups/sandybridge/FLOPS_SP.txt b/groups/sandybridge/FLOPS_SP.txt new file mode 100644 index 000000000..b66b82f2c --- /dev/null +++ b/groups/sandybridge/FLOPS_SP.txt @@ -0,0 +1,29 @@ +SHORT Single Precision MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED_SINGLE +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR_SINGLE +PMC2 FP_256_PACKED_SINGLE + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +MFlops/s 1.0E-06*(PMC0*4.0+PMC1)/time +AVX MFlops/s 1.0E-06*(PMC2*8.0)/time +Packed MUOPS/s 1.0E-06*(PMC0+PMC2)/time +Scalar MUOPS/s 1.0E-06*PMC1/time + +LONG +Formula: +MFlops/s = (FP_COMP_OPS_EXE_SSE_FP_PACKED*4 + FP_COMP_OPS_EXE_SSE_FP_SCALAR)/ runtime +AVX MFlops/s = (FP_256_PACKED_SINGLE*8)/ runtime +- +SSE scalar and packed single precision flop rates. Please note that the current +flop measurements on SandyBridge are potentially wrong. So you cannot trust +these counters at the moment! + diff --git a/groups/sandybridge/L2.txt b/groups/sandybridge/L2.txt new file mode 100644 index 000000000..5345b7aba --- /dev/null +++ b/groups/sandybridge/L2.txt @@ -0,0 +1,32 @@ +SHORT L2 cache bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L1D_REPLACEMENT +PMC1 L1D_M_EVICT + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L2 Load [MBytes/s] 1.0E-06*PMC0*64.0/time +L2 Evict [MBytes/s] 1.0E-06*PMC1*64.0/time +L2 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L2 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 + +LONG +Formulas: +L2 Load [MBytes/s] = 1.0E-06*L1D_REPLACEMENT*64/time +L2 Evict [MBytes/s] = 1.0E-06*L1D_M_EVICT*64/time +L2 bandwidth [MBytes/s] = 1.0E-06*(L1D_REPLACEMENT+L1D_M_EVICT)*64/time +L2 data volume [GBytes] = 1.0E-09*(L1D_REPLACEMENT+L1D_M_EVICT)*64 +- +Profiling group to measure L2 cache bandwidth. The bandwidth is computed by the +number of cacheline allocated in the L1 and the number of modified cachelines +evicted from the L1. The group also output total data volume transfered between +L2 and L1. Note that this bandwidth also includes data transfers due to a write +allocate load on a store miss in L1. + diff --git a/groups/sandybridge/L2CACHE.txt b/groups/sandybridge/L2CACHE.txt new file mode 100644 index 000000000..3d7c36ea1 --- /dev/null +++ b/groups/sandybridge/L2CACHE.txt @@ -0,0 +1,35 @@ +SHORT L2 cache miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L2_TRANS_ALL_REQUESTS +PMC1 L2_RQSTS_MISS + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L2 request rate PMC0/FIXC0 +L2 miss rate PMC1/FIXC0 +L2 miss ratio PMC1/PMC0 + +LONG +Formulas: +L2 request rate = L2_TRANS_ALL_REQUESTS / INSTR_RETIRED_ANY +L2 miss rate = L2_RQSTS_MISS / INSTR_RETIRED_ANY +L2 miss ratio = L2_RQSTS_MISS / L2_TRANS_ALL_REQUESTS +- +This group measures the locality of your data accesses with regard to the +L2 Cache. L2 request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The L2 miss rate gives a measure how often it was necessary to get +cachelines from memory. And finally L2 miss ratio tells you how many of your +memory references required a cacheline to be loaded from a higher level. +While the Data cache miss rate might be given by your algorithm you should +try to get Data cache miss ratio as low as possible by increasing your cache reuse. +Note: This group might need to be revised! + + diff --git a/groups/sandybridge/L3.txt b/groups/sandybridge/L3.txt new file mode 100644 index 000000000..9a7c914b7 --- /dev/null +++ b/groups/sandybridge/L3.txt @@ -0,0 +1,32 @@ +SHORT L3 cache bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L2_LINES_IN_ALL +PMC1 L2_LINES_OUT_DIRTY_ALL + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L3 Load [MBytes/s] 1.0E-06*PMC0*64.0/time +L3 Evict [MBytes/s] 1.0E-06*PMC1*64.0/time +L3 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L3 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 + +LONG +Formulas: +L3 Load [MBytes/s] 1.0E-06*L2_LINES_IN_ALL*64/time +L3 Evict [MBytes/s] 1.0E-06*L2_LINES_OUT_DIRTY_ALL*64/time +L3 bandwidth [MBytes/s] 1.0E-06*(L2_LINES_IN_ALL+L2_LINES_OUT_DIRTY_ALL)*64/time +L3 data volume [GBytes] 1.0E-09*(L2_LINES_IN_ALL+L2_LINES_OUT_DIRTY_ALL)*64 +- +Profiling group to measure L3 cache bandwidth. The bandwidth is computed by the +number of cacheline allocated in the L2 and the number of modified cachelines +evicted from the L2. This group also outputs data volume transfered between the +L3 and measured cores L2 caches. Note that this bandwidth also includes data +transfers due to a write allocate load on a store miss in L2. + diff --git a/groups/sandybridge/MEM.txt b/groups/sandybridge/MEM.txt new file mode 100644 index 000000000..96e77a7ff --- /dev/null +++ b/groups/sandybridge/MEM.txt @@ -0,0 +1,30 @@ +SHORT Main memory bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +MBOX0C0 CAS_COUNT_RD +MBOX1C0 CAS_COUNT_WR +MBOX0C1 CAS_COUNT_RD +MBOX1C1 CAS_COUNT_WR +MBOX0C2 CAS_COUNT_RD +MBOX1C2 CAS_COUNT_WR +MBOX0C3 CAS_COUNT_RD +MBOX1C3 CAS_COUNT_WR + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Memory Read BW [MBytes/s] 1.0E-06*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3)*64.0/time +Memory Write BW [MBytes/s] 1.0E-06*(MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0/time +Memory BW [MBytes/s] 1.0E-06*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3+MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0/time +Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3+MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0 + +LONG +Profiling group to measure memory bandwidth drawn by all cores of a socket. +Since this group is based on uncore events it is only possible to measure on a +per socket base. Also outputs total data volume transfered from main memory. + diff --git a/groups/sandybridge/MEM_DP.txt b/groups/sandybridge/MEM_DP.txt new file mode 100644 index 000000000..e2c513091 --- /dev/null +++ b/groups/sandybridge/MEM_DP.txt @@ -0,0 +1,49 @@ +SHORT Overview of arithmetic and main memory performance + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PWR0 PWR_PKG_ENERGY +PWR3 PWR_DRAM_ENERGY +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED_DOUBLE +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR_DOUBLE +PMC2 FP_256_PACKED_DOUBLE +MBOX0C0 CAS_COUNT_RD +MBOX1C0 CAS_COUNT_WR +MBOX0C1 CAS_COUNT_RD +MBOX1C1 CAS_COUNT_WR +MBOX0C2 CAS_COUNT_RD +MBOX1C2 CAS_COUNT_WR +MBOX0C3 CAS_COUNT_RD +MBOX1C3 CAS_COUNT_WR + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Energy [J] PWR0 +Power [W] PWR0/time +Energy DRAM [J] PWR3 +Power DRAM [W] PWR3/time +MFlops/s 1.0E-06*(PMC0*2.0+PMC1)/time +32b AVX MFlops/s 1.0E-06*(PMC2*4.0)/time +Packed MUOPS/s 1.0E-06*(PMC0+PMC2)/time +Scalar MUOPS/s 1.0E-06*PMC1/time +Memory Read BW [MBytes/s] 1.0E-06*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3)*64.0/time +Memory Write BW [MBytes/s] 1.0E-06*(MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0/time +Memory BW [MBytes/s] 1.0E-06*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3+MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0/time +Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3+MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0 + +LONG +Formula: +MFlops/s = (FP_COMP_OPS_EXE_SSE_FP_PACKED*2 + FP_COMP_OPS_EXE_SSE_FP_SCALAR)/ runtime +AVX MFlops/s = (FP_256_PACKED_DOUBLE*4)/ runtime +-- +Profiling group to measure memory bandwidth drawn by all cores of a socket. +Since this group is based on uncore events it is only possible to measure on +a per socket base. Also outputs total data volume transfered from main memory. +SSE scalar and packed double precision flop rates. Also reports on packed AVX +32b instructions. Please note that the current flop measurements on IvyBridge +are potentially wrong. So you cannot trust these counters at the moment! diff --git a/groups/sandybridge/MEM_SP.txt b/groups/sandybridge/MEM_SP.txt new file mode 100644 index 000000000..972ad98a7 --- /dev/null +++ b/groups/sandybridge/MEM_SP.txt @@ -0,0 +1,49 @@ +SHORT Overview of arithmetic and main memory performance + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PWR0 PWR_PKG_ENERGY +PWR3 PWR_DRAM_ENERGY +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED_SINGLE +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR_SINGLE +PMC2 FP_256_PACKED_DOUBLE +MBOX0C0 CAS_COUNT_RD +MBOX1C0 CAS_COUNT_WR +MBOX0C1 CAS_COUNT_RD +MBOX1C1 CAS_COUNT_WR +MBOX0C2 CAS_COUNT_RD +MBOX1C2 CAS_COUNT_WR +MBOX0C3 CAS_COUNT_RD +MBOX1C3 CAS_COUNT_WR + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Energy [J] PWR0 +Power [W] PWR0/time +Energy DRAM [J] PWR3 +Power DRAM [W] PWR3/time +MFlops/s 1.0E-06*(PMC0*4.0+PMC1)/time +32b AVX MFlops/s 1.0E-06*(PMC2*8.0)/time +Packed MUOPS/s 1.0E-06*(PMC0+PMC2)/time +Scalar MUOPS/s 1.0E-06*PMC1/time +Memory Read BW [MBytes/s] 1.0E-06*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3)*64.0/time +Memory Write BW [MBytes/s] 1.0E-06*(MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0/time +Memory BW [MBytes/s] 1.0E-06*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3+MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0/time +Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX0C1+MBOX0C2+MBOX0C3+MBOX1C0+MBOX1C1+MBOX1C2+MBOX1C3)*64.0 + +LONG +Formula: +MFlops/s = (FP_COMP_OPS_EXE_SSE_FP_PACKED_SINGLE * 4 + FP_COMP_OPS_EXE_SSE_FP_SCALAR_SINGLE) / runtime +AVX MFlops/s = (FP_256_PACKED_SINGLE * 8) / runtime +-- +Profiling group to measure memory bandwidth drawn by all cores of a socket. +Since this group is based on uncore events it is only possible to measure on +a per socket base. Also outputs total data volume transfered from main memory. +SSE scalar and packed single precision flop rates. Also reports on packed AVX +32b instructions. Please note that the current flop measurements on IvyBridge +are potentially wrong. So you cannot trust these counters at the moment! diff --git a/groups/sandybridge/TLB.txt b/groups/sandybridge/TLB.txt new file mode 100644 index 000000000..78bf096bb --- /dev/null +++ b/groups/sandybridge/TLB.txt @@ -0,0 +1,22 @@ +SHORT TLB miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 DTLB_LOAD_MISSES_CAUSES_A_WALK + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L1 DTLB miss rate PMC0/FIXC0 + +LONG +Formulas: +DTLB miss rate LOAD_MISSES_CAUSES_A_WALK / INSTR_RETIRED_ANY +- +The DTLB miss rate gives a measure how often a TLB miss occured +per instruction. + diff --git a/groups/westmere/BRANCH.txt b/groups/westmere/BRANCH.txt new file mode 100644 index 000000000..3d814167f --- /dev/null +++ b/groups/westmere/BRANCH.txt @@ -0,0 +1,31 @@ +SHORT Branch prediction miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 BR_INST_RETIRED_ALL_BRANCHES +PMC1 BR_MISP_RETIRED_ALL_BRANCHES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Branch rate PMC0/FIXC0 +Branch misprediction rate PMC1/FIXC0 +Branch misprediction ratio PMC1/PMC0 +Instructions per branch FIXC0/PMC0 + +LONG +Formulas: +Branch rate = BR_INST_RETIRED_ALL_BRANCHES / INSTR_RETIRED_ANY +Branch misprediction rate = BR_MISP_RETIRED_ALL_BRANCHES / INSTR_RETIRED_ANY +Branch misprediction ratio = BR_MISP_RETIRED_ALL_BRANCHES / BR_INST_RETIRED_ALL_BRANCHES +Instructions per branch = INSTR_RETIRED_ANY / BR_INST_RETIRED_ALL_BRANCHES +- +The rates state how often in average a branch or a mispredicted branch occured +per instruction retired in total. The Branch misprediction ratio sets directly +into relation what ration of all branch instruction where mispredicted. +Instructions per branch is 1/Branch rate. + diff --git a/groups/westmere/CACHE.txt b/groups/westmere/CACHE.txt new file mode 100644 index 000000000..4ceed06fa --- /dev/null +++ b/groups/westmere/CACHE.txt @@ -0,0 +1,25 @@ +SHORT Data cache miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L1D_REPL + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Data cache misses PMC0 +Data cache miss rate PMC0/FIXC0 + +LONG +Formulas: +Data cache miss rate = L1D_REPL / INSTR_RETIRED_ANY +- +This group measures the locality of your data accesses with regard to the +L1 Cache. +The Data cache miss rate gives a measure how often it was necessary to get +cachelines from higher levels of the memory hierarchy. + diff --git a/groups/westmere/DATA.txt b/groups/westmere/DATA.txt new file mode 100644 index 000000000..a5611bc19 --- /dev/null +++ b/groups/westmere/DATA.txt @@ -0,0 +1,22 @@ +SHORT Load to store ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 MEM_INST_RETIRED_LOADS +PMC1 MEM_INST_RETIRED_STORES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Load to Store ratio PMC0/PMC1 + +LONG +Formulas: +Load to Store ratio = MEM_INST_RETIRED_LOADS / MEM_INST_RETIRED_STORES +- +This is a simple metric to determine your Load to store ratio. + diff --git a/groups/westmere/FLOPS_DP.txt b/groups/westmere/FLOPS_DP.txt new file mode 100644 index 000000000..c5ba91c69 --- /dev/null +++ b/groups/westmere/FLOPS_DP.txt @@ -0,0 +1,31 @@ +SHORT Double Precision MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR +PMC2 FP_COMP_OPS_EXE_SSE_SINGLE_PRECISION +PMC3 FP_COMP_OPS_EXE_SSE_DOUBLE_PRECISION + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +DP MFlops/s 1.0E-06*(PMC0*2.0+PMC1)/time +Packed MUOPS/s 1.0E-06*PMC0/time +Scalar MUOPS/s 1.0E-06*PMC1/time +SP MUOPS/s 1.0E-06*PMC2/time +DP MUOPS/s 1.0E-06*PMC3/time + +LONG +Formula: +DP MFlops/s = (FP_COMP_OPS_EXE_SSE_FP_PACKED*2 + FP_COMP_OPS_EXE_SSE_FP_SCALAR)/ runtime +- +The Nehalem has not possibility to measure MFlops if mixed precision calculations are done. +Therefore both Single as well as Double precision are measured to ensure the correctness +of the measurements. You can check if your code was vectorized on the number of +FP_COMP_OPS_EXE_SSE_FP_PACKED versus the FP_COMP_OPS_EXE_SSE_FP_SCALAR. + diff --git a/groups/westmere/FLOPS_SP.txt b/groups/westmere/FLOPS_SP.txt new file mode 100644 index 000000000..4478c8f38 --- /dev/null +++ b/groups/westmere/FLOPS_SP.txt @@ -0,0 +1,31 @@ +SHORT Single Precision MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR +PMC2 FP_COMP_OPS_EXE_SSE_SINGLE_PRECISION +PMC3 FP_COMP_OPS_EXE_SSE_DOUBLE_PRECISION + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +SP MFlops/s (SP assumed) 1.0E-06*(PMC0*4.0+PMC1)/time +Packed MUOPS/s 1.0E-06*PMC0/time +Scalar MUOPS/s 1.0E-06*PMC1/time +SP MUOPS/s 1.0E-06*PMC2/time +DP MUOPS/s 1.0E-06*PMC3/time + +LONG +Formula: +SP MFlops/s = (FP_COMP_OPS_EXE_SSE_FP_PACKED*4 + FP_COMP_OPS_EXE_SSE_FP_SCALAR)/ runtime +- +The Nehalem has not possibility to measure MFlops if mixed precision calculations are done. +Therefore both Single as well as Double precision are measured to ensure the correctness +of the measurements. You can check if your code was vectorized on the number of +FP_COMP_OPS_EXE_SSE_FP_PACKED versus the FP_COMP_OPS_EXE_SSE_FP_SCALAR. + diff --git a/groups/westmere/FLOPS_X87.txt b/groups/westmere/FLOPS_X87.txt new file mode 100644 index 000000000..6447b930e --- /dev/null +++ b/groups/westmere/FLOPS_X87.txt @@ -0,0 +1,18 @@ +SHORT X87 MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 INST_RETIRED_X87 + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +X87 MFlops/s 1.0E-06*PMC0/time + +LONG +Profiling group to measure X87 flop rate. + diff --git a/groups/westmere/L2.txt b/groups/westmere/L2.txt new file mode 100644 index 000000000..5506f1f99 --- /dev/null +++ b/groups/westmere/L2.txt @@ -0,0 +1,32 @@ +SHORT L2 cache bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L1D_REPL +PMC1 L1D_M_EVICT + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L2 Load [MBytes/s] 1.0E-06*PMC0*64.0/time +L2 Evict [MBytes/s] 1.0E-06*PMC1*64.0/time +L2 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L2 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 + +LONG +Formulas: +L2 Load [MBytes/s] = 1.0E-06*L1D_REPL*64/time +L2 Evict [MBytes/s] = 1.0E-06*L1D_M_EVICT*64/time +L2 bandwidth [MBytes/s] = 1.0E-06*(L1D_REPL+L1D_M_EVICT)*64/time +L2 data volume [GBytes] = 1.0E-09*(L1D_REPL+L1D_M_EVICT)*64 +- +Profiling group to measure L2 cache bandwidth. The bandwidth is computed by the +number of cacheline allocated in the L1 and the number of modified cachelines +evicted from the L1. The group also reports on data volume transfered between +L2 and L1 cache. Note that this bandwidth also includes data transfers due to a +write allocate load on a store miss in L1. + diff --git a/groups/westmere/L2CACHE.txt b/groups/westmere/L2CACHE.txt new file mode 100644 index 000000000..49778be04 --- /dev/null +++ b/groups/westmere/L2CACHE.txt @@ -0,0 +1,35 @@ +SHORT L2 cache miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L2_DATA_RQSTS_DEMAND_ANY +PMC1 L2_RQSTS_MISS + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L2 request rate PMC0/FIXC0 +L2 miss rate PMC1/FIXC0 +L2 miss ratio PMC1/PMC0 + +LONG +Formulas: +L2 request rate = L2_DATA_RQSTS_DEMAND_ANY / INSTR_RETIRED_ANY +L2 miss rate = L2_RQSTS_MISS / INSTR_RETIRED_ANY +L2 miss ratio = L2_RQSTS_MISS / L2_DATA_RQSTS_DEMAND_ANY +- +This group measures the locality of your data accesses with regard to the +L2 Cache. L2 request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The L2 miss rate gives a measure how often it was necessary to get +cachelines from memory. And finally L2 miss ratio tells you how many of your +memory references required a cacheline to be loaded from a higher level. +While the Data cache miss rate might be given by your algorithm you should +try to get Data cache miss ratio as low as possible by increasing your cache reuse. +Note: This group might need to be revised! + + diff --git a/groups/westmere/L3.txt b/groups/westmere/L3.txt new file mode 100644 index 000000000..6a58f78ab --- /dev/null +++ b/groups/westmere/L3.txt @@ -0,0 +1,32 @@ +SHORT L3 cache bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L2_LINES_IN_ANY +PMC1 L2_LINES_OUT_ANY + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L3 Load [MBytes/s] 1.0E-06*PMC0*64.0/time +L3 Evict [MBytes/s] 1.0E-06*PMC1*64.0/time +L3 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L3 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 + +LONG +Formulas: +L3 Load [MBytes/s] 1.0E-06*L2_LINES_IN_ANY*64/time +L3 Evict [MBytes/s] 1.0E-06*L2_LINES_OUT_DEMAND_DIRTY*64/time +L3 bandwidth [MBytes/s] 1.0E-06*(L2_LINES_IN_ANY+L2_LINES_OUT_DEMAND_DIRTY)*64/time +L3 data volume [GBytes] 1.0E-09*(L2_LINES_IN_ANY+L2_LINES_OUT_DEMAND_DIRTY)*64 +- +Profiling group to measure L3 cache bandwidth. The bandwidth is computed by the +number of cacheline allocated in the L2 and the number of modified cachelines +evicted from the L2. The group also reports total data volume between L3 and +the measured L2 cache. Note that this bandwidth also includes data transfers +due to a write allocate load on a store miss in L2. + diff --git a/groups/westmere/L3CACHE.txt b/groups/westmere/L3CACHE.txt new file mode 100644 index 000000000..944bc97b7 --- /dev/null +++ b/groups/westmere/L3CACHE.txt @@ -0,0 +1,36 @@ +SHORT L3 cache miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +UPMC0 UNC_L3_HITS_ANY +UPMC1 UNC_L3_MISS_ANY +UPMC2 UNC_L3_LINES_IN_ANY +UPMC3 UNC_L3_LINES_OUT_ANY + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L3 request rate UPMC0/FIXC0 +L3 miss rate UPMC1/FIXC0 +L3 miss ratio UPMC1/(UPMC0+UPMC1) + +LONG +Formulas: +L3 request rate UNC_L3_HITS_ANY / INSTR_RETIRED_ANY +L3 miss rate UNC_L3_MISS_ANY / INSTR_RETIRED_ANY +L3 miss ratio UNC_L3_MISS_ANY / (UNC_L3_HITS_ANY + UNC_L3_MISS_ANY) +- +This group measures the locality of your data accesses with regard to the L3 +Cache. L3 request rate tells you how data intensive your code is or how many +Data accesses you have in average per instruction. The L3 miss rate gives a +measure how often it was necessary to get cachelines from memory. And finally +L3 miss ratio tells you how many of your memory references required a cacheline +to be loaded from a higher level. While the Data cache miss rate might be given +by your algorithm you should try to get Data cache miss ratio as low as +possible by increasing your cache reuse. + + diff --git a/groups/westmere/MEM.txt b/groups/westmere/MEM.txt new file mode 100644 index 000000000..f9e19ad6e --- /dev/null +++ b/groups/westmere/MEM.txt @@ -0,0 +1,37 @@ +SHORT Main memory bandwidth + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +UPMC0 UNC_QMC_NORMAL_READS_ANY +UPMC1 UNC_QMC_WRITES_FULL_ANY +UPMC2 UNC_QHL_REQUESTS_REMOTE_READS +UPMC3 UNC_QHL_REQUESTS_LOCAL_READS +UPMC4 UNC_QHL_REQUESTS_REMOTE_WRITES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Memory bandwidth [MBytes/s] 1.0E-06*(UPMC0+UPMC1)*64/time +Memory data volume [GBytes] 1.0E-09*(UPMC0+UPMC1)*64 +Remote Read BW [MBytes/s] 1.0E-06*(UPMC2)*64/time +Remote Write BW [MBytes/s] 1.0E-06*(UPMC4)*64/time +Remote BW [MBytes/s] 1.0E-06*(UPMC2+UPMC4)*64/time + +LONG +Formulas: +Memory bandwidth [MBytes/s] = 1.0E-06*(UNC_QMC_NORMAL_READS_ANY+UNC_QMC_WRITES_FULL_ANY)*64/time +Memory data volume [GBytes] = 1.0E-09*(UNC_QMC_NORMAL_READS_ANY+UNC_QMC_WRITES_FULL_ANY)*64 +Remote Read BW [MBytes/s] = 1.0E-06*(UNC_QHL_REQUESTS_REMOTE_READS)*64/time; +Remote Write BW [MBytes/s] 1.0E-06*(UNC_QHL_REQUESTS_REMOTE_WRITES)*64/time +Remote BW [MBytes/s] 1.0E-06*(UNC_QHL_REQUESTS_REMOTE_READS+UNC_QHL_REQUESTS_REMOTE_WRITES)*64/time +- +Profiling group to measure memory bandwidth drawn by all cores of a socket. +This group will be measured by one core per socket. The Remote Read BW tells +you if cachelines are transfered between sockets, meaning that cores access +data owned by a remote NUMA domain. The group also reports total data volume +transfered from main memory. + diff --git a/groups/westmere/TLB.txt b/groups/westmere/TLB.txt new file mode 100644 index 000000000..00773508c --- /dev/null +++ b/groups/westmere/TLB.txt @@ -0,0 +1,22 @@ +SHORT TLB miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 DTLB_MISSES_ANY + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L1 DTLB miss rate PMC0/FIXC0 + +LONG +Formulas: +DTLB miss rate = DTLB_MISSES_ANY / INSTR_RETIRED_ANY +- +The DTLB miss rate gives a measure how often a TLB miss occured +per instruction. + diff --git a/groups/westmere/VIEW.txt b/groups/westmere/VIEW.txt new file mode 100644 index 000000000..a0708f4c7 --- /dev/null +++ b/groups/westmere/VIEW.txt @@ -0,0 +1,50 @@ +SHORT Overview of arithmetic and memory performance + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR +PMC2 FP_COMP_OPS_EXE_SSE_SINGLE_PRECISION +PMC3 FP_COMP_OPS_EXE_SSE_DOUBLE_PRECISION +UPMC0 UNC_QMC_NORMAL_READS_ANY +UPMC1 UNC_QMC_WRITES_FULL_ANY +UPMC2 UNC_QHL_REQUESTS_REMOTE_READS +UPMC3 UNC_QHL_REQUESTS_LOCAL_READS +UPMC4 UNC_QHL_REQUESTS_REMOTE_WRITES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +DP MFlops/s (DP assumed) 1.0E-06*(PMC0*2.0+PMC1)/time +SP MFlops/s (SP assumed) 1.0E-06*(PMC0*4.0+PMC1)/time +Packed MUOPS/s 1.0E-06*PMC0/time +Scalar MUOPS/s 1.0E-06*PMC1/time +SP MUOPS/s 1.0E-06*PMC2/time +DP MUOPS/s 1.0E-06*PMC3/time +Memory bandwidth [MBytes/s] 1.0E-06*(UPMC0+UPMC1)*64/time +Memory data volume [GBytes] 1.0E-09*(UPMC0+UPMC1)*64 +Remote Read BW [MBytes/s] 1.0E-06*(UPMC2)*64/time +Remote Write BW [MBytes/s] 1.0E-06*(UPMC4)*64/time +Remote BW [MBytes/s] 1.0E-06*(UPMC2+UPMC4)*64/time + +LONG +Formula: +DP MFlops/s = (FP_COMP_OPS_EXE_SSE_FP_PACKED*2 + FP_COMP_OPS_EXE_SSE_FP_SCALAR)/ runtime +SP MFlops/s = (FP_COMP_OPS_EXE_SSE_FP_PACKED*4 + FP_COMP_OPS_EXE_SSE_FP_SCALAR)/ runtime +Packed MUOPS/s 1.0E-06*FP_COMP_OPS_EXE_SSE_FP_PACKED/time +Scalar MUOPS/s 1.0E-06*FP_COMP_OPS_EXE_SSE_FP_SCALAR/time +SP MUOPS/s 1.0E-06*FP_COMP_OPS_EXE_SSE_SINGLE_PRECISION/time +DP MUOPS/s 1.0E-06*FP_COMP_OPS_EXE_SSE_DOUBLE_PRECISION/time +Memory bandwidth [MBytes/s] 1.0E-06*(UNC_QMC_NORMAL_READS_ANY+UNC_QMC_WRITES_FULL_ANY)*64/time +Memory data volume [GBytes] 1.0E-09*(UNC_QMC_NORMAL_READS_ANY+UNC_QMC_WRITES_FULL_ANY)*64 +Remote Read BW [MBytes/s] 1.0E-06*(UNC_QHL_REQUESTS_REMOTE_READS)*64/time +Remote Write BW [MBytes/s] 1.0E-06*(UNC_QHL_REQUESTS_REMOTE_WRITES)*64/time +Remote BW [MBytes/s] 1.0E-06*(UNC_QHL_REQUESTS_REMOTE_READS+UNC_QHL_REQUESTS_REMOTE_WRITES)*64/time +- +This is a overview group using the capabilities of westmere to measure multiple events at +the same time. + diff --git a/groups/westmereEX/BRANCH.txt b/groups/westmereEX/BRANCH.txt new file mode 100644 index 000000000..3d814167f --- /dev/null +++ b/groups/westmereEX/BRANCH.txt @@ -0,0 +1,31 @@ +SHORT Branch prediction miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 BR_INST_RETIRED_ALL_BRANCHES +PMC1 BR_MISP_RETIRED_ALL_BRANCHES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Branch rate PMC0/FIXC0 +Branch misprediction rate PMC1/FIXC0 +Branch misprediction ratio PMC1/PMC0 +Instructions per branch FIXC0/PMC0 + +LONG +Formulas: +Branch rate = BR_INST_RETIRED_ALL_BRANCHES / INSTR_RETIRED_ANY +Branch misprediction rate = BR_MISP_RETIRED_ALL_BRANCHES / INSTR_RETIRED_ANY +Branch misprediction ratio = BR_MISP_RETIRED_ALL_BRANCHES / BR_INST_RETIRED_ALL_BRANCHES +Instructions per branch = INSTR_RETIRED_ANY / BR_INST_RETIRED_ALL_BRANCHES +- +The rates state how often in average a branch or a mispredicted branch occured +per instruction retired in total. The Branch misprediction ratio sets directly +into relation what ration of all branch instruction where mispredicted. +Instructions per branch is 1/Branch rate. + diff --git a/groups/westmereEX/CACHE.txt b/groups/westmereEX/CACHE.txt new file mode 100644 index 000000000..490f982d3 --- /dev/null +++ b/groups/westmereEX/CACHE.txt @@ -0,0 +1,24 @@ +SHORT Data cache miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L1D_REPL + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Data cache misses PMC0 +Data cache miss rate PMC0/FIXC0 + +LONG +Formulas: +Data cache miss rate = L1D_REPL / INSTR_RETIRED_ANY +- +This group measures the locality of your data accesses with regard to the L1 +Cache. The Data cache miss rate gives a measure how often it was necessary to +get cachelines from higher levels of the memory hierarchy. + diff --git a/groups/westmereEX/DATA.txt b/groups/westmereEX/DATA.txt new file mode 100644 index 000000000..a5611bc19 --- /dev/null +++ b/groups/westmereEX/DATA.txt @@ -0,0 +1,22 @@ +SHORT Load to store ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 MEM_INST_RETIRED_LOADS +PMC1 MEM_INST_RETIRED_STORES + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Load to Store ratio PMC0/PMC1 + +LONG +Formulas: +Load to Store ratio = MEM_INST_RETIRED_LOADS / MEM_INST_RETIRED_STORES +- +This is a simple metric to determine your Load to store ratio. + diff --git a/groups/westmereEX/FLOPS_DP.txt b/groups/westmereEX/FLOPS_DP.txt new file mode 100644 index 000000000..a62cbe3fe --- /dev/null +++ b/groups/westmereEX/FLOPS_DP.txt @@ -0,0 +1,31 @@ +SHORT Double Precision MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR +PMC2 FP_COMP_OPS_EXE_SSE_SINGLE_PRECISION +PMC3 FP_COMP_OPS_EXE_SSE_DOUBLE_PRECISION + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +MFlops/s 1.0E-06*(PMC0*2.0+PMC1)/time +Packed MUOPS/s 1.0E-06*PMC0/time +Scalar MUOPS/s 1.0E-06*PMC1/time +SP MUOPS/s 1.0E-06*PMC2/time +DP MUOPS/s 1.0E-06*PMC3/time + +LONG +Formula: +DP MFlops/s = (FP_COMP_OPS_EXE_SSE_FP_PACKED*2 + FP_COMP_OPS_EXE_SSE_FP_SCALAR)/ runtime +- +The Nehalem has not possibility to measure MFlops if mixed precision calculations are done. +Therefore both Single as well as Double precision are measured to ensure the correctness +of the measurements. You can check if your code was vectorized on the number of +FP_COMP_OPS_EXE_SSE_FP_PACKED versus the FP_COMP_OPS_EXE_SSE_FP_SCALAR. + diff --git a/groups/westmereEX/FLOPS_SP.txt b/groups/westmereEX/FLOPS_SP.txt new file mode 100644 index 000000000..148561572 --- /dev/null +++ b/groups/westmereEX/FLOPS_SP.txt @@ -0,0 +1,31 @@ +SHORT Single Precision MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED +PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR +PMC2 FP_COMP_OPS_EXE_SSE_SINGLE_PRECISION +PMC3 FP_COMP_OPS_EXE_SSE_DOUBLE_PRECISION + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +MFlops/s 1.0E-06*(PMC0*4.0+PMC1)/time +Packed MUOPS/s 1.0E-06*PMC0/time +Scalar MUOPS/s 1.0E-06*PMC1/time +SP MUOPS/s 1.0E-06*PMC2/time +DP MUOPS/s 1.0E-06*PMC3/time + +LONG +Formula: +SP MFlops/s = (FP_COMP_OPS_EXE_SSE_FP_PACKED*4 + FP_COMP_OPS_EXE_SSE_FP_SCALAR)/ runtime +- +The Nehalem has not possibility to measure MFlops if mixed precision calculations are done. +Therefore both Single as well as Double precision are measured to ensure the correctness +of the measurements. You can check if your code was vectorized on the number of +FP_COMP_OPS_EXE_SSE_FP_PACKED versus the FP_COMP_OPS_EXE_SSE_FP_SCALAR. + diff --git a/groups/westmereEX/FLOPS_X87.txt b/groups/westmereEX/FLOPS_X87.txt new file mode 100644 index 000000000..6447b930e --- /dev/null +++ b/groups/westmereEX/FLOPS_X87.txt @@ -0,0 +1,18 @@ +SHORT X87 MFlops/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 INST_RETIRED_X87 + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +X87 MFlops/s 1.0E-06*PMC0/time + +LONG +Profiling group to measure X87 flop rate. + diff --git a/groups/westmereEX/L2.txt b/groups/westmereEX/L2.txt new file mode 100644 index 000000000..9201cd0fa --- /dev/null +++ b/groups/westmereEX/L2.txt @@ -0,0 +1,32 @@ +SHORT L2 cache bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L1D_REPL +PMC1 L1D_M_EVICT + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L2 Load [MBytes/s] 1.0E-06*PMC0*64.0/time +L2 Evict [MBytes/s] 1.0E-06*PMC1*64.0/time +L2 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L2 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 + +LONG +Formulas: +L2 Load [MBytes/s] = 1.0E-06*L1D_REPL*64/time +L2 Evict [MBytes/s] = 1.0E-06*L1D_M_EVICT*64/time +L2 bandwidth [MBytes/s] = 1.0E-06*(L1D_REPL+L1D_M_EVICT)*64/time +L2 data volume [GBytes] = 1.0E-09*(L1D_REPL+L1D_M_EVICT)*64 +- +Profiling group to measure L2 cache bandwidth. The bandwidth is computed by the +number of cacheline allocated in the L1 and the number of modified cachelines +evicted from the L1. Also reports on total data volume transfered between L2 +and L1 cache. Note that this bandwidth also includes data transfers due to a +write allocate load on a store miss in L1. + diff --git a/groups/westmereEX/L2CACHE.txt b/groups/westmereEX/L2CACHE.txt new file mode 100644 index 000000000..49778be04 --- /dev/null +++ b/groups/westmereEX/L2CACHE.txt @@ -0,0 +1,35 @@ +SHORT L2 cache miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L2_DATA_RQSTS_DEMAND_ANY +PMC1 L2_RQSTS_MISS + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L2 request rate PMC0/FIXC0 +L2 miss rate PMC1/FIXC0 +L2 miss ratio PMC1/PMC0 + +LONG +Formulas: +L2 request rate = L2_DATA_RQSTS_DEMAND_ANY / INSTR_RETIRED_ANY +L2 miss rate = L2_RQSTS_MISS / INSTR_RETIRED_ANY +L2 miss ratio = L2_RQSTS_MISS / L2_DATA_RQSTS_DEMAND_ANY +- +This group measures the locality of your data accesses with regard to the +L2 Cache. L2 request rate tells you how data intensive your code is +or how many Data accesses you have in average per instruction. +The L2 miss rate gives a measure how often it was necessary to get +cachelines from memory. And finally L2 miss ratio tells you how many of your +memory references required a cacheline to be loaded from a higher level. +While the Data cache miss rate might be given by your algorithm you should +try to get Data cache miss ratio as low as possible by increasing your cache reuse. +Note: This group might need to be revised! + + diff --git a/groups/westmereEX/L3.txt b/groups/westmereEX/L3.txt new file mode 100644 index 000000000..f80761a81 --- /dev/null +++ b/groups/westmereEX/L3.txt @@ -0,0 +1,32 @@ +SHORT L3 cache bandwidth in MBytes/s + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 L2_LINES_IN_ANY +PMC1 L2_LINES_OUT_DEMAND_DIRTY + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L3 Load [MBytes/s] 1.0E-06*PMC0*64.0/time +L3 Evict [MBytes/s] 1.0E-06*PMC1*64.0/time +L3 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time +L3 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0 + +LONG +Formulas: +L3 Load [MBytes/s] 1.0E-06*L2_LINES_IN_ANY*64/time +L3 Evict [MBytes/s] 1.0E-06*L2_LINES_OUT_DEMAND_DIRTY*64/time +L3 bandwidth [MBytes/s] 1.0E-06*(L2_LINES_IN_ANY+L2_LINES_OUT_DEMAND_DIRTY)*64/time +L3 data volume [GBytes] 1.0E-09*(L2_LINES_IN_ANY+L2_LINES_OUT_DEMAND_DIRTY)*64 +- +Profiling group to measure L3 cache bandwidth. The bandwidth is +computed by the number of cacheline allocated in the L2 and the number of +modified cachelines evicted from the L2. Also reports data volume transfered +between L3 and L2 caches. Note that this bandwidth also includes data transfers +due to a write allocate load on a store miss in L2. + diff --git a/groups/westmereEX/MEM.txt b/groups/westmereEX/MEM.txt new file mode 100644 index 000000000..defa391d3 --- /dev/null +++ b/groups/westmereEX/MEM.txt @@ -0,0 +1,37 @@ +SHORT Main memory bandwidth + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +MBOX0C0 FVC_EV0_BBOX_CMDS_READS +MBOX0C1 FVC_EV0_BBOX_RSP_ACK +MBOX1C0 FVC_EV0_BBOX_CMDS_READS +MBOX1C1 FVC_EV0_BBOX_RSP_ACK +BBOX0C1 IMT_INSERTS_WR +BBOX1C1 IMT_INSERTS_WR +RBOX0C0 NEW_PACKETS_RECV_PORT0_IPERF0_ANY_DRS +RBOX0C1 NEW_PACKETS_RECV_PORT1_IPERF0_ANY_DRS +RBOX1C0 NEW_PACKETS_RECV_PORT4_IPERF0_ANY_DRS +RBOX1C1 NEW_PACKETS_RECV_PORT5_IPERF0_ANY_DRS + + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +Memory Read BW [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0)*64/time +Memory Write BW [MBytes/s] 1.0E-06*(BBOX0C1+BBOX1C1)*64/time +Memory BW [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+BBOX0C1+BBOX1C1)*64/time +Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+BBOX0C1+BBOX1C1)*64 +Remote write data traffic Port 0 [MBytes/s] 1.0E-06*(RBOX0C0)*64/time +Remote write data traffic Port 1 [MBytes/s] 1.0E-06*(RBOX0C1)*64/time +Remote write data traffic Port 4 [MBytes/s] 1.0E-06*(RBOX1C0)*64/time +Remote write data traffic Port 5 [MBytes/s] 1.0E-06*(RBOX1C1)*64/time + +LONG +Profiling group to measure memory bandwidth drawn by all cores of a socket. +Addional to the bandwidth it also outputs the data volume and the remote +traffic over QPI links to other sockets. + diff --git a/groups/westmereEX/TLB.txt b/groups/westmereEX/TLB.txt new file mode 100644 index 000000000..00773508c --- /dev/null +++ b/groups/westmereEX/TLB.txt @@ -0,0 +1,22 @@ +SHORT TLB miss rate/ratio + +EVENTSET +FIXC0 INSTR_RETIRED_ANY +FIXC1 CPU_CLK_UNHALTED_CORE +FIXC2 CPU_CLK_UNHALTED_REF +PMC0 DTLB_MISSES_ANY + +METRICS +Runtime (RDTSC) [s] time +Runtime unhalted [s] FIXC1*inverseClock +Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock +CPI FIXC1/FIXC0 +L1 DTLB miss rate PMC0/FIXC0 + +LONG +Formulas: +DTLB miss rate = DTLB_MISSES_ANY / INSTR_RETIRED_ANY +- +The DTLB miss rate gives a measure how often a TLB miss occured +per instruction. + diff --git a/kernel/Makefile b/kernel/Makefile new file mode 100644 index 000000000..170841da7 --- /dev/null +++ b/kernel/Makefile @@ -0,0 +1,10 @@ +obj-m := enable_rdpmc.o +KERNELDIR ?= /lib/modules/$(shell uname -r)/build +PWD := $(shell pwd) + +all: + $(MAKE) -Wpacked -C $(KERNELDIR) M=$(PWD) modules + chmod 666 enable_rdpmc.ko + +clean: + rm -f *.ko *.o diff --git a/kernel/enable_rdpmc.c b/kernel/enable_rdpmc.c new file mode 100644 index 000000000..aa04e6126 --- /dev/null +++ b/kernel/enable_rdpmc.c @@ -0,0 +1,65 @@ +/* + * Read PMC in kernel mode. + */ +#include /* Needed by all modules */ +#include /* Needed for KERN_INFO */ + + + + +static void printc4(void) { + uint64_t output; + // Read back CR4 to check the bit. + __asm__("\t mov %%cr4,%0" : "=r"(output)); + printk(KERN_INFO "%llu", output); +} + +static void setc4b8(void * info) { + // Set CR4, Bit 8 (9th bit from the right) to enable + __asm__("push %rax\n\t" + "mov %cr4,%rax;\n\t" + "or $(1 << 8),%rax;\n\t" + "mov %rax,%cr4;\n\t" + "wbinvd\n\t" + "pop %rax" + ); + + // Check which CPU we are on: + printk(KERN_INFO "Ran on Processor %d", smp_processor_id()); + //printc4(); +} + +static void clearc4b8(void * info) { + printc4(); + __asm__("push %rax\n\t" + "push %rbx\n\t" + "mov %cr4,%rax;\n\t" + "mov $(1 << 8), %rbx\n\t" + "not %rbx\n\t" + "and %rbx, %rax;\n\t" + "mov %rax,%cr4;\n\t" + "wbinvd\n\t" + "pop %rbx\n\t" + "pop %rax\n\t" + ); + printk(KERN_INFO "Ran on Processor %d", smp_processor_id()); +} + + + +int start_module(void) +{ + on_each_cpu(setc4b8, NULL, 0); + return 0; +} +void stop_module(void) +{ + on_each_cpu(clearc4b8, NULL, 0); +} + +module_init(start_module); +module_exit(stop_module) + +MODULE_AUTHOR("Thomas Roehl "); +MODULE_DESCRIPTION("Enable RDPMC from userspace"); +MODULE_LICENSE("GPL"); diff --git a/kernel/enable_rdpmc.mod.c b/kernel/enable_rdpmc.mod.c new file mode 100644 index 000000000..10bd68c1e --- /dev/null +++ b/kernel/enable_rdpmc.mod.c @@ -0,0 +1,32 @@ +#include +#include +#include + +MODULE_INFO(vermagic, VERMAGIC_STRING); + +struct module __this_module +__attribute__((section(".gnu.linkonce.this_module"))) = { + .name = KBUILD_MODNAME, + .init = init_module, +#ifdef CONFIG_MODULE_UNLOAD + .exit = cleanup_module, +#endif + .arch = MODULE_ARCH_INIT, +}; + +static const struct modversion_info ____versions[] +__used +__attribute__((section("__versions"))) = { + { 0xec4b56e8, "module_layout" }, + { 0x47c7b0d2, "cpu_number" }, + { 0x27e1a049, "printk" }, + { 0x5541ea93, "on_each_cpu" }, +}; + +static const char __module_depends[] +__used +__attribute__((section(".modinfo"))) = +"depends="; + + +MODULE_INFO(srcversion, "49F1A231DE25E13CB08B394"); diff --git a/make/config_checks.mk b/make/config_checks.mk new file mode 100644 index 000000000..dc91bb076 --- /dev/null +++ b/make/config_checks.mk @@ -0,0 +1,33 @@ + +# determine kernel Version +KERNEL_VERSION := $(shell uname -r | cut -d'.' -f3 | cut -d'-' -f1) +KERNEL_VERSION_MAJOR := $(shell uname -r | cut -d'.' -f1) + +HAS_MEMPOLICY = $(shell if [ $(KERNEL_VERSION) -lt 7 -a $(KERNEL_VERSION_MAJOR) -lt 3 ]; then \ + echo 0; else echo 1; \ + fi; ) + +HAS_RDTSCP = $(shell /bin/bash -c "cat /proc/cpuinfo | grep -c rdtscp") + +# determine glibc Version +GLIBC_VERSION := $(shell ldd --version | grep ldd | awk '{ print $$NF }' | awk -F. '{ print $$2 }') + +HAS_SCHEDAFFINITY = $(shell if [ $(GLIBC_VERSION) -lt 4 ]; then \ + echo 0; else echo 1; \ + fi; ) + + +ifneq ($(FORTRAN_INTERFACE),false) +HAS_FORTRAN_COMPILER = $(shell $(FC) --version 2>/dev/null || echo 'NOFORTRAN' ) +ifeq ($(HAS_FORTRAN_COMPILER),NOFORTRAN) +FORTRAN_INTERFACE= +$(info Warning: You have selected the fortran interface in config.mk, but there seems to be no fortran compiler - not compiling it!) +else +FORTRAN_INTERFACE = likwid.mod +FORTRAN_INSTALL = @cp -f likwid.mod $(PREFIX)/include/ +endif +else +FORTRAN_INTERFACE = +FORTRAN_INSTALL = +endif + diff --git a/make/config_defines.mk b/make/config_defines.mk new file mode 100644 index 000000000..a97d71fb6 --- /dev/null +++ b/make/config_defines.mk @@ -0,0 +1,59 @@ +DEFINES += -DVERSION=$(VERSION) \ + -DRELEASE=$(RELEASE) \ + -DCFGFILE=$(CFG_FILE_PATH) \ + -DMAX_NUM_THREADS=$(MAX_NUM_THREADS) \ + -DMAX_NUM_NODES=$(MAX_NUM_NODES) \ + -DHASH_TABLE_SIZE=$(HASH_TABLE_SIZE) \ + -DLIBLIKWIDPIN=$(LIBLIKWIDPIN) \ + -DLIKWIDFILTERPATH=$(LIKWIDFILTERPATH) \ + -D_GNU_SOURCE + +ifneq ($(COLOR),NONE) +DEFINES += -DCOLOR=$(COLOR) +endif + +ifeq ($(BUILDDAEMON),true) + DAEMON_TARGET = likwid-accessD +endif + +ifeq ($(INSTRUMENT_BENCH),true) +DEFINES += -DPERFMON +endif + +ifeq ($(HAS_MEMPOLICY),1) +DEFINES += -DHAS_MEMPOLICY +else +$(info Kernel 2.6.$(KERNEL_VERSION) has no mempolicy support!); +endif + +ifeq ($(HAS_RDTSCP),0) +$(info Buildung without RDTSCP timing support!); +else +DEFINES += -DHAS_RDTSCP +endif + +ifeq ($(HAS_SCHEDAFFINITY),1) +DEFINES += -DHAS_SCHEDAFFINITY +PINLIB = liblikwidpin.so +else +$(info GLIBC version 2.$(GLIBC_VERSION) has no pthread_setaffinity_np support!); +PINLIB = +endif + +ifeq ($(USE_HWLOC),true) +DEFINES += -DLIKWID_USE_HWLOC +endif + +DEFINES += -DACCESSDAEMON=$(ACCESSDAEMON) + +ifeq ($(ACCESSMODE),sysdaemon) +DEFINES += -DACCESSMODE=2 +else +ifeq ($(ACCESSMODE),accessdaemon) +DEFINES += -DACCESSMODE=1 +else +DEFINES += -DACCESSMODE=0 +endif +endif + + diff --git a/make/include_GCC.mk b/make/include_GCC.mk new file mode 100644 index 000000000..fe56a540a --- /dev/null +++ b/make/include_GCC.mk @@ -0,0 +1,33 @@ +CC = gcc +FC = ifort +AS = as +AR = ar +PAS = ./perl/AsmGen.pl +GEN_PAS = ./perl/generatePas.pl +GEN_GROUPS = ./perl/generateGroups.pl +GEN_PMHEADER = ./perl/gen_events.pl + +ANSI_CFLAGS = +#ANSI_CFLAGS += -pedantic +#ANSI_CFLAGS += -Wextra +#ANSI_CFLAGS += -Wall + +CFLAGS = -O2 -std=c99 -Wno-format +FCFLAGS = -module ./ # ifort +#FCFLAGS = -J ./ -fsyntax-only #gfortran +PASFLAGS = x86-64 +ASFLAGS = +CPPFLAGS = +LFLAGS = -pthread + +SHARED_CFLAGS = -fpic +SHARED_LFLAGS = -shared + +DEFINES = -DPAGE_ALIGNMENT=4096 +DEFINES += -DLIKWID_MONITOR_LOCK +DEFINES += -DDEBUGLEV=0 + +INCLUDES = +LIBS = -lm + + diff --git a/make/include_GCCX86.mk b/make/include_GCCX86.mk new file mode 100644 index 000000000..2d4430168 --- /dev/null +++ b/make/include_GCCX86.mk @@ -0,0 +1,32 @@ +CC = gcc +AS = as +AR = ar +PAS = ./perl/AsmGen.pl +GEN_PAS = ./perl/generatePas.pl +GEN_GROUPS = ./perl/generateGroups.pl +GEN_PMHEADER = ./perl/gen_events.pl + +ANSI_CFLAGS = -std=c99 +#ANSI_CFLAGS += -pedantic +#ANSI_CFLAGS += -Wextra +#ANSI_CFLAGS += -Wall + +CFLAGS = -O2 -g -m32 -Wno-format +FCFLAGS = -J ./ -fsyntax-only +PASFLAGS = x86 +ASFLAGS = --32 -g +CPPFLAGS = +LFLAGS = -m32 -g -pthread + +SHARED_CFLAGS = -fpic +SHARED_LFLAGS = -shared + +DEFINES = -D_GNU_SOURCE +DEFINES += -DPAGE_ALIGNMENT=4096 +DEFINES += -DLIKWID_MONITOR_LOCK +DEFINES += -DDEBUGLEV=0 + +INCLUDES = +LIBS = -lm + + diff --git a/make/include_ICC.mk b/make/include_ICC.mk new file mode 100644 index 000000000..b379daab4 --- /dev/null +++ b/make/include_ICC.mk @@ -0,0 +1,32 @@ +CC = icc +FC = ifort +AS = as +AR = ar +PAS = ./perl/AsmGen.pl +GEN_PAS = ./perl/generatePas.pl +GEN_GROUPS = ./perl/generateGroups.pl +GEN_PMHEADER = ./perl/gen_events.pl + +ANSI_CFLAGS = -strict-ansi +ANSI_CFLAGS += -std=c99 + +CFLAGS = -O1 -Wno-format -vec-report=0 +FCFLAGS = -module ./ +ASFLAGS = -gdwarf-2 +PASFLAGS = x86-64 +CPPFLAGS = +LFLAGS = -pthread + +SHARED_CFLAGS = -fpic +SHARED_LFLAGS = -shared + +DEFINES = -D_GNU_SOURCE +DEFINES += -DMAX_NUM_THREADS=128 +DEFINES += -DPAGE_ALIGNMENT=4096 +#enable this option to build likwid-bench with marker API for likwid-perfctr +#DEFINES += -DPERFMON + +INCLUDES = +LIBS = + + diff --git a/make/include_MIC.mk b/make/include_MIC.mk new file mode 100644 index 000000000..17276e873 --- /dev/null +++ b/make/include_MIC.mk @@ -0,0 +1,33 @@ +CC = icc +FC = gfortran +AS = icc +AR = ar +PAS = ./perl/AsmGen.pl +GEN_PAS = ./perl/generatePas.pl +GEN_GROUPS = ./perl/generateGroups.pl +GEN_PMHEADER = ./perl/gen_events.pl + +ANSI_CFLAGS = -std=c99 +ANSI_CFLAGS += -pedantic +#ANSI_CFLAGS += -Wextra +#ANSI_CFLAGS += -Wall + +CFLAGS = -mmic -O0 -g -Wno-format +FCFLAGS = -J ./ -fsyntax-only +#FCFLAGS = -module ./ +ASFLAGS = -mmic -c +PASFLAGS = x86-64 +CPPFLAGS = +LFLAGS = -pthread -g -mmic + +SHARED_CFLAGS = -fpic -mmic +SHARED_LFLAGS = -shared -mmic + +DEFINES = -D_GNU_SOURCE +DEFINES += -DPAGE_ALIGNMENT=4096 +DEFINES += -DDEBUGLEV=0 + +INCLUDES = +LIBS = -lm + + diff --git a/perl/AsmGen.pl b/perl/AsmGen.pl new file mode 100755 index 000000000..dcd79463e --- /dev/null +++ b/perl/AsmGen.pl @@ -0,0 +1,284 @@ +#!/usr/bin/perl -w +use strict; +no strict "refs"; +use warnings; +use lib './perl'; +use Parse::RecDescent; +use Data::Dumper; +use Getopt::Std; +use Cwd 'abs_path'; + +use gas; + +my $ROOT = abs_path('./'); +my $DEBUG=0; +my $VERBOSE=0; +our $ISA = 'x86'; +our $AS = 'gas'; +my $OPT_STRING = 'hpvda:i:o:'; +my %OPT; +my $INPUTFILE; +my $OUTPUTFILE; +my $CPP_ARGS=''; + +# Enable warnings within the Parse::RecDescent module. +$::RD_ERRORS = 1; # Make sure the parser dies when it encounters an error +#$::RD_WARN = 1; # Enable warnings. This will warn on unused rules &c. +#$::RD_HINT = 1; # Give out hints to help fix problems. +#$::RD_TRACE = 1; # if defined, also trace parsers' behaviour +$::RD_AUTOACTION = q { [@item[0..$#item]] }; + +sub init +{ + getopts( "$OPT_STRING", \%OPT ) or usage(); + if ($OPT{h}) { usage(); }; + if ($OPT{v}) { $VERBOSE = 1;} + if ($OPT{d}) { $DEBUG = 1;} + + if (! $ARGV[0]) { + die "ERROR: Please specify a input file!\n\nCall script with argument -h for help.\n"; + } + + $INPUTFILE = $ARGV[0]; + $CPP_ARGS = $ARGV[1] if ($ARGV[1]); + + if ($INPUTFILE =~ /.pas$/) { + $INPUTFILE =~ s/\.pas//; + } else { + die "ERROR: Input file must have pas ending!\n"; + } + if ($OPT{o}) { + $OUTPUTFILE = $OPT{o}; + }else { + $OUTPUTFILE = "$INPUTFILE.s"; + } + if ($OPT{i}) { + $ISA = $OPT{i}; + print "INFO: Using isa $ISA.\n\n" if ($VERBOSE); + } else { + print "INFO: No isa specified.\n Using default $ISA.\n\n" if ($VERBOSE); + } + if ($OPT{a}) { + $AS = $OPT{a}; + print "INFO: Using as $AS.\n\n" if ($VERBOSE); + } else { + print "INFO: No as specified.\n Using default $AS.\n\n" if ($VERBOSE); + } + + as::isa_init(); +} + +sub usage +{ + print < + +Required: + : Input pas file + +Optional: +-h : this (help) message +-v : verbose output +-d : debug mode: prints out the parse tree +-p : Print out intermediate preprocessed output +-o : Output file +-a : Specify different assembler (Default: gas) +-i : Specify different isa (Default: x86) + +Example: +$0 -i x86-64 -a masm -o out.s myfile.pas + +END + +exit(0); +} + +#======================================= +# GRAMMAR +#======================================= +$main::grammar = <<'_EOGRAMMAR_'; +# Terminals +FUNC : /func/i +LOOP : /loop/i +ALLOCATE : /allocate/i +FACTOR : /factor/i +DEFINE : /define/i +USE : /use/i +STOP : /stop/i +START : /start/i +LOCAL : /local/i +TIMER : /timer/i +INCREMENT : /increment/i +ALIGN : /align/i +INT : /int/i +SINGLE : /single/i +DOUBLE : /double/i +INUMBER : NUMBER +UNUMBER : NUMBER +SNUMBER : NUMBER +FNUMBER : NUMBER +OFFSET : /([0-9]+\,){15}[0-9]+/ +NUMBER : /[-+]?[0-9]*\.?[0-9]+/ +SYMBOL : /[.A-Z-a-z_][A-Za-z0-9_]*/ +REG : /GPR[0-9]+/i +SREG : /GPR[0-9]+/i +COMMENT : /#.*/ +{'skip'} + +type: SINGLE + |DOUBLE + |INT + +align: ALIGN NUMBER +{ +{FUNC => 'as::align', + ARGS => ["$item{NUMBER}[1]"]} +} + +ASMCODE : /[A-Za-z1-9.:]+.*/ +{ +{FUNC => 'as::emit_code', + ARGS => [$item[1]]} +} + +function: FUNC SYMBOL block +{[ + {FUNC => 'as::function_entry', + ARGS => [$item{SYMBOL}[1],0]}, + $item{block}, + {FUNC => 'as::function_exit', + ARGS => [$item{SYMBOL}[1]]} +]} + +function_allocate: FUNC SYMBOL ALLOCATE NUMBER block +{[ + {FUNC => 'as::function_entry', + ARGS => [$item{SYMBOL}[1],$item{NUMBER}[1]]}, + $item{block}, + {FUNC => 'as::function_exit', + ARGS => [$item{SYMBOL}[1]]} +]} + +loop: LOOP SYMBOL INUMBER SNUMBER block +{[ +{FUNC => 'as::loop_entry', + ARGS => [$item{SYMBOL}[1],$item{SNUMBER}[1][1]]}, + $item{block}, +{FUNC => 'as::loop_exit', + ARGS => [$item{SYMBOL}[1],$item{INUMBER}[1][1]]} +]} +| LOOP SYMBOL INUMBER SREG block +{[ +{FUNC => 'as::loop_entry', + ARGS => [$item{SYMBOL}[1],$item{SREG}[1]]}, + $item{block}, +{FUNC => 'as::loop_exit', + ARGS => [$item{SYMBOL}[1],$item{INUMBER}[1][1]]} +]} + +timer: START TIMER +{ +{FUNC => 'isa::start_timer', + ARGS => []} +} +| STOP TIMER +{ +{FUNC => 'isa::stop_timer', + ARGS => []} +} + +mode: START LOCAL +{ +{FUNC => 'as::mode', + ARGS => [$item[1][1]]} +} +| STOP LOCAL +{ +{FUNC => 'as::mode', + ARGS => [$item[1][1]]} +} + +block: '{' expression(s) '}' +{ $item[2] } + +define_data: DEFINE type SYMBOL OFFSET +{ +{FUNC => 'as::define_offset', + ARGS => [$item{SYMBOL}[1], $item{type}[1][1],"$item{OFFSET}[1]"]} +} + +define_data: DEFINE type SYMBOL NUMBER +{ +{FUNC => 'as::define_data', + ARGS => [$item{SYMBOL}[1], $item{type}[1][1],"$item{NUMBER}[1]"]} +} + + +expression: align + |COMMENT + |loop + |timer + |mode + |ASMCODE +{ $item[1] } + +instruction : define_data + | align + | COMMENT + | mode + | function + | function_allocate +{ $item[1] } + +startrule: instruction(s) +{ $item[1] } + +_EOGRAMMAR_ + + +#======================================= +# MAIN +#======================================= +init(); +print "INFO: Calling cpp with arguments $CPP_ARGS.\n" if ($VERBOSE); +my $text = `cpp -x assembler-with-cpp $CPP_ARGS $INPUTFILE.pas`; + +if ($OPT{p}) { + open FILE,">$INPUTFILE.Pas"; + print FILE $text; + close FILE; +} + +open STDOUT,">$OUTPUTFILE"; +print "$as::AS->{HEADER}\n"; + +my $parser = new Parse::RecDescent ($main::grammar) or die "ERROR: Bad grammar!\n"; +my $parse_tree = $parser->startrule($text) or print STDERR "ERROR: Syntax Error\n"; +tree_exec($parse_tree); + +if ($DEBUG) { + open FILE,'>parse_tree.txt'; + print FILE Dumper $parse_tree,"\n"; + close FILE; +} + +print "$as::AS->{FOOTER}\n"; + +sub tree_exec +{ + my $tree = shift; + + foreach my $node (@$tree) { + if ($node !~ /^skip|^instruction|^expression|^loop/) { + if (ref($node) eq 'ARRAY') { + tree_exec($node); + }else { + if (ref($node) eq 'HASH') { + &{$node->{FUNC}}(@{$node->{ARGS}}); + } + } + } + } +} + + diff --git a/perl/Parse/RecDescent.pm b/perl/Parse/RecDescent.pm new file mode 100644 index 000000000..35b9e9d2c --- /dev/null +++ b/perl/Parse/RecDescent.pm @@ -0,0 +1,3045 @@ +# GENERATE RECURSIVE DESCENT PARSER OBJECTS FROM A GRAMMARC +# SEE RecDescent.pod FOR FULL DETAILS + +use 5.005; +use strict; + +package Parse::RecDescent; + +use Text::Balanced qw ( extract_codeblock extract_bracketed extract_quotelike extract_delimited ); + +use vars qw ( $skip ); + + *defskip = \ '\s*'; # DEFAULT SEPARATOR IS OPTIONAL WHITESPACE + $skip = '\s*'; # UNIVERSAL SEPARATOR IS OPTIONAL WHITESPACE +my $MAXREP = 100_000_000; # REPETITIONS MATCH AT MOST 100,000,000 TIMES + + +sub import # IMPLEMENT PRECOMPILER BEHAVIOUR UNDER: + # perl -MParse::RecDescent - +{ + local *_die = sub { print @_, "\n"; exit }; + + my ($package, $file, $line) = caller; + if (substr($file,0,1) eq '-' && $line == 0) + { + _die("Usage: perl -MLocalTest - ") + unless @ARGV == 2; + + my ($sourcefile, $class) = @ARGV; + + local *IN; + open IN, $sourcefile + or _die("Can't open grammar file '$sourcefile'"); + + my $grammar = join '', ; + + Parse::RecDescent->Precompile($grammar, $class, $sourcefile); + exit; + } +} + +sub Save +{ + my ($self, $class) = @_; + $self->{saving} = 1; + $self->Precompile(undef,$class); + $self->{saving} = 0; +} + +sub Precompile +{ + my ($self, $grammar, $class, $sourcefile) = @_; + + $class =~ /^(\w+::)*\w+$/ or croak("Bad class name: $class"); + + my $modulefile = $class; + $modulefile =~ s/.*:://; + $modulefile .= ".pm"; + + open OUT, ">$modulefile" + or croak("Can't write to new module file '$modulefile'"); + + print STDERR "precompiling grammar from file '$sourcefile'\n", + "to class $class in module file '$modulefile'\n" + if $grammar && $sourcefile; + + # local $::RD_HINT = 1; + $self = Parse::RecDescent->new($grammar,1,$class) + || croak("Can't compile bad grammar") + if $grammar; + + foreach ( keys %{$self->{rules}} ) + { $self->{rules}{$_}{changed} = 1 } + + print OUT "package $class;\nuse Parse::RecDescent;\n\n"; + + print OUT "{ my \$ERRORS;\n\n"; + + print OUT $self->_code(); + + print OUT "}\npackage $class; sub new { "; + print OUT "my "; + + require Data::Dumper; + print OUT Data::Dumper->Dump([$self], [qw(self)]); + + print OUT "}"; + + close OUT + or croak("Can't write to new module file '$modulefile'"); +} + + +package Parse::RecDescent::LineCounter; + + +sub TIESCALAR # ($classname, \$text, $thisparser, $prevflag) +{ + bless { + text => $_[1], + parser => $_[2], + prev => $_[3]?1:0, + }, $_[0]; +} + +my %counter_cache; + +sub FETCH +{ + my $parser = $_[0]->{parser}; + my $from = $parser->{fulltextlen}-length(${$_[0]->{text}})-$_[0]->{prev} +; + + unless (exists $counter_cache{$from}) { + $parser->{lastlinenum} = $parser->{offsetlinenum} + - Parse::RecDescent::_linecount(substr($parser->{fulltext},$from)) + + 1; + $counter_cache{$from} = $parser->{lastlinenum}; + } + return $counter_cache{$from}; +} + +sub STORE +{ + my $parser = $_[0]->{parser}; + $parser->{offsetlinenum} -= $parser->{lastlinenum} - $_[1]; + return undef; +} + +sub resync # ($linecounter) +{ + my $self = tied($_[0]); + die "Tried to alter something other than a LineCounter\n" + unless $self =~ /Parse::RecDescent::LineCounter/; + + my $parser = $self->{parser}; + my $apparently = $parser->{offsetlinenum} + - Parse::RecDescent::_linecount(${$self->{text}}) + + 1; + + $parser->{offsetlinenum} += $parser->{lastlinenum} - $apparently; + return 1; +} + +package Parse::RecDescent::ColCounter; + +sub TIESCALAR # ($classname, \$text, $thisparser, $prevflag) +{ + bless { + text => $_[1], + parser => $_[2], + prev => $_[3]?1:0, + }, $_[0]; +} + +sub FETCH +{ + my $parser = $_[0]->{parser}; + my $missing = $parser->{fulltextlen}-length(${$_[0]->{text}})-$_[0]->{prev}+1; + substr($parser->{fulltext},0,$missing) =~ m/^(.*)\Z/m; + return length($1); +} + +sub STORE +{ + die "Can't set column number via \$thiscolumn\n"; +} + + +package Parse::RecDescent::OffsetCounter; + +sub TIESCALAR # ($classname, \$text, $thisparser, $prev) +{ + bless { + text => $_[1], + parser => $_[2], + prev => $_[3]?-1:0, + }, $_[0]; +} + +sub FETCH +{ + my $parser = $_[0]->{parser}; + return $parser->{fulltextlen}-length(${$_[0]->{text}})+$_[0]->{prev}; +} + +sub STORE +{ + die "Can't set current offset via \$thisoffset or \$prevoffset\n"; +} + + + +package Parse::RecDescent::Rule; + +sub new ($$$$$) +{ + my $class = ref($_[0]) || $_[0]; + my $name = $_[1]; + my $owner = $_[2]; + my $line = $_[3]; + my $replace = $_[4]; + + if (defined $owner->{"rules"}{$name}) + { + my $self = $owner->{"rules"}{$name}; + if ($replace && !$self->{"changed"}) + { + $self->reset; + } + return $self; + } + else + { + return $owner->{"rules"}{$name} = + bless + { + "name" => $name, + "prods" => [], + "calls" => [], + "changed" => 0, + "line" => $line, + "impcount" => 0, + "opcount" => 0, + "vars" => "", + }, $class; + } +} + +sub reset($) +{ + @{$_[0]->{"prods"}} = (); + @{$_[0]->{"calls"}} = (); + $_[0]->{"changed"} = 0; + $_[0]->{"impcount"} = 0; + $_[0]->{"opcount"} = 0; + $_[0]->{"vars"} = ""; +} + +sub DESTROY {} + +sub hasleftmost($$) +{ + my ($self, $ref) = @_; + + my $prod; + foreach $prod ( @{$self->{"prods"}} ) + { + return 1 if $prod->hasleftmost($ref); + } + + return 0; +} + +sub leftmostsubrules($) +{ + my $self = shift; + my @subrules = (); + + my $prod; + foreach $prod ( @{$self->{"prods"}} ) + { + push @subrules, $prod->leftmostsubrule(); + } + + return @subrules; +} + +sub expected($) +{ + my $self = shift; + my @expected = (); + + my $prod; + foreach $prod ( @{$self->{"prods"}} ) + { + my $next = $prod->expected(); + unless (! $next or _contains($next,@expected) ) + { + push @expected, $next; + } + } + + return join ', or ', @expected; +} + +sub _contains($@) +{ + my $target = shift; + my $item; + foreach $item ( @_ ) { return 1 if $target eq $item; } + return 0; +} + +sub addcall($$) +{ + my ( $self, $subrule ) = @_; + unless ( _contains($subrule, @{$self->{"calls"}}) ) + { + push @{$self->{"calls"}}, $subrule; + } +} + +sub addprod($$) +{ + my ( $self, $prod ) = @_; + push @{$self->{"prods"}}, $prod; + $self->{"changed"} = 1; + $self->{"impcount"} = 0; + $self->{"opcount"} = 0; + $prod->{"number"} = $#{$self->{"prods"}}; + return $prod; +} + +sub addvar +{ + my ( $self, $var, $parser ) = @_; + if ($var =~ /\A\s*local\s+([%@\$]\w+)/) + { + $parser->{localvars} .= " $1"; + $self->{"vars"} .= "$var;\n" } + else + { $self->{"vars"} .= "my $var;\n" } + $self->{"changed"} = 1; + return 1; +} + +sub addautoscore +{ + my ( $self, $code ) = @_; + $self->{"autoscore"} = $code; + $self->{"changed"} = 1; + return 1; +} + +sub nextoperator($) +{ + my $self = shift; + my $prodcount = scalar @{$self->{"prods"}}; + my $opcount = ++$self->{"opcount"}; + return "_operator_${opcount}_of_production_${prodcount}_of_rule_$self->{name}"; +} + +sub nextimplicit($) +{ + my $self = shift; + my $prodcount = scalar @{$self->{"prods"}}; + my $impcount = ++$self->{"impcount"}; + return "_alternation_${impcount}_of_production_${prodcount}_of_rule_$self->{name}"; +} + + +sub code +{ + my ($self, $namespace, $parser) = @_; + +eval 'undef &' . $namespace . '::' . $self->{"name"} unless $parser->{saving}; + + my $code = +' +# ARGS ARE: ($parser, $text; $repeating, $_noactions, \@args) +sub ' . $namespace . '::' . $self->{"name"} . ' +{ + my $thisparser = $_[0]; + use vars q{$tracelevel}; + local $tracelevel = ($tracelevel||0)+1; + $ERRORS = 0; + my $thisrule = $thisparser->{"rules"}{"' . $self->{"name"} . '"}; + + Parse::RecDescent::_trace(q{Trying rule: [' . $self->{"name"} . ']}, + Parse::RecDescent::_tracefirst($_[1]), + q{' . $self->{"name"} . '}, + $tracelevel) + if defined $::RD_TRACE; + + ' . ($parser->{deferrable} + ? 'my $def_at = @{$thisparser->{deferred}};' + : '') . + ' + my $err_at = @{$thisparser->{errors}}; + + my $score; + my $score_return; + my $_tok; + my $return = undef; + my $_matched=0; + my $commit=0; + my @item = (); + my %item = (); + my $repeating = defined($_[2]) && $_[2]; + my $_noactions = defined($_[3]) && $_[3]; + my @arg = defined $_[4] ? @{ &{$_[4]} } : (); + my %arg = ($#arg & 01) ? @arg : (@arg, undef); + my $text; + my $lastsep=""; + my $expectation = new Parse::RecDescent::Expectation($thisrule->expected()); + $expectation->at($_[1]); + '. ($parser->{_check}{thisoffset}?' + my $thisoffset; + tie $thisoffset, q{Parse::RecDescent::OffsetCounter}, \$text, $thisparser; + ':'') . ($parser->{_check}{prevoffset}?' + my $prevoffset; + tie $prevoffset, q{Parse::RecDescent::OffsetCounter}, \$text, $thisparser, 1; + ':'') . ($parser->{_check}{thiscolumn}?' + my $thiscolumn; + tie $thiscolumn, q{Parse::RecDescent::ColCounter}, \$text, $thisparser; + ':'') . ($parser->{_check}{prevcolumn}?' + my $prevcolumn; + tie $prevcolumn, q{Parse::RecDescent::ColCounter}, \$text, $thisparser, 1; + ':'') . ($parser->{_check}{prevline}?' + my $prevline; + tie $prevline, q{Parse::RecDescent::LineCounter}, \$text, $thisparser, 1; + ':'') . ' + my $thisline; + tie $thisline, q{Parse::RecDescent::LineCounter}, \$text, $thisparser; + + '. $self->{vars} .' +'; + + my $prod; + foreach $prod ( @{$self->{"prods"}} ) + { + $prod->addscore($self->{autoscore},0,0) if $self->{autoscore}; + next unless $prod->checkleftmost(); + $code .= $prod->code($namespace,$self,$parser); + + $code .= $parser->{deferrable} + ? ' splice + @{$thisparser->{deferred}}, $def_at unless $_matched; + ' + : ''; + } + + $code .= +' + unless ( $_matched || defined($return) || defined($score) ) + { + ' .($parser->{deferrable} + ? ' splice @{$thisparser->{deferred}}, $def_at; + ' + : '') . ' + + $_[1] = $text; # NOT SURE THIS IS NEEDED + Parse::RecDescent::_trace(q{<>}, + Parse::RecDescent::_tracefirst($_[1]), + q{' . $self->{"name"} .'}, + $tracelevel) + if defined $::RD_TRACE; + return undef; + } + if (!defined($return) && defined($score)) + { + Parse::RecDescent::_trace(q{>>Accepted scored production<<}, "", + q{' . $self->{"name"} .'}, + $tracelevel) + if defined $::RD_TRACE; + $return = $score_return; + } + splice @{$thisparser->{errors}}, $err_at; + $return = $item[$#item] unless defined $return; + if (defined $::RD_TRACE) + { + Parse::RecDescent::_trace(q{>>Matched rule<< (return value: [} . + $return . q{])}, "", + q{' . $self->{"name"} .'}, + $tracelevel); + Parse::RecDescent::_trace(q{(consumed: [} . + Parse::RecDescent::_tracemax(substr($_[1],0,-length($text))) . q{])}, + Parse::RecDescent::_tracefirst($text), + , q{' . $self->{"name"} .'}, + $tracelevel) + } + $_[1] = $text; + return $return; +} +'; + + return $code; +} + +my @left; +sub isleftrec($$) +{ + my ($self, $rules) = @_; + my $root = $self->{"name"}; + @left = $self->leftmostsubrules(); + my $next; + foreach $next ( @left ) + { + next unless defined $rules->{$next}; # SKIP NON-EXISTENT RULES + return 1 if $next eq $root; + my $child; + foreach $child ( $rules->{$next}->leftmostsubrules() ) + { + push(@left, $child) + if ! _contains($child, @left) ; + } + } + return 0; +} + +package Parse::RecDescent::Production; + +sub describe ($;$) +{ + return join ' ', map { $_->describe($_[1]) or () } @{$_[0]->{items}}; +} + +sub new ($$;$$) +{ + my ($self, $line, $uncommit, $error) = @_; + my $class = ref($self) || $self; + + bless + { + "items" => [], + "uncommit" => $uncommit, + "error" => $error, + "line" => $line, + strcount => 0, + patcount => 0, + dircount => 0, + actcount => 0, + }, $class; +} + +sub expected ($) +{ + my $itemcount = scalar @{$_[0]->{"items"}}; + return ($itemcount) ? $_[0]->{"items"}[0]->describe(1) : ''; +} + +sub hasleftmost ($$) +{ + my ($self, $ref) = @_; + return ${$self->{"items"}}[0] eq $ref if scalar @{$self->{"items"}}; + return 0; +} + +sub leftmostsubrule($) +{ + my $self = shift; + + if ( $#{$self->{"items"}} >= 0 ) + { + my $subrule = $self->{"items"}[0]->issubrule(); + return $subrule if defined $subrule; + } + + return (); +} + +sub checkleftmost($) +{ + my @items = @{$_[0]->{"items"}}; + if (@items==1 && ref($items[0]) =~ /\AParse::RecDescent::Error/ + && $items[0]->{commitonly} ) + { + Parse::RecDescent::_warn(2,"Lone in production treated + as "); + Parse::RecDescent::_hint("A production consisting of a single + conditional directive would + normally succeed (with the value zero) if the + rule is not 'commited' when it is + tried. Since you almost certainly wanted + ' ' Parse::RecDescent + supplied it for you."); + push @{$_[0]->{items}}, + Parse::RecDescent::UncondReject->new(0,0,''); + } + elsif (@items==1 && ($items[0]->describe||"") =~ /describe||"") =~ /describe ."]"); + my $what = $items[0]->describe =~ / (which acts like an unconditional during parsing)" + : $items[0]->describe =~ / (which acts like an unconditional during parsing)" + : "an unconditional "; + my $caveat = $items[0]->describe =~ / 1 + ? "However, there were also other (useless) items after the leading " + . $items[0]->describe + . ", so you may have been expecting some other behaviour." + : "You can safely ignore this message."; + Parse::RecDescent::_hint("The production starts with $what. That means that the + production can never successfully match, so it was + optimized out of the final parser$caveat. $advice"); + return 0; + } + return 1; +} + +sub changesskip($) +{ + my $item; + foreach $item (@{$_[0]->{"items"}}) + { + if (ref($item) =~ /Parse::RecDescent::(Action|Directive)/) + { + return 1 if $item->{code} =~ /\$skip/; + } + } + return 0; +} + +sub adddirective +{ + my ( $self, $whichop, $line, $name ) = @_; + push @{$self->{op}}, + { type=>$whichop, line=>$line, name=>$name, + offset=> scalar(@{$self->{items}}) }; +} + +sub addscore +{ + my ( $self, $code, $lookahead, $line ) = @_; + $self->additem(Parse::RecDescent::Directive->new( + "local \$^W; + my \$thisscore = do { $code } + 0; + if (!defined(\$score) || \$thisscore>\$score) + { \$score=\$thisscore; \$score_return=\$item[-1]; } + undef;", $lookahead, $line,"") ) + unless $self->{items}[-1]->describe =~ /{op}) + { + while (my $next = pop @{$self->{op}}) + { + Parse::RecDescent::_error("Incomplete <$next->{type}op:...>.", $line); + Parse::RecDescent::_hint( + "The current production ended without completing the + <$next->{type}op:...> directive that started near line + $next->{line}. Did you forget the closing '>'?"); + } + } + return 1; +} + +sub enddirective +{ + my ( $self, $line, $minrep, $maxrep ) = @_; + unless ($self->{op}) + { + Parse::RecDescent::_error("Unmatched > found.", $line); + Parse::RecDescent::_hint( + "A '>' angle bracket was encountered, which typically + indicates the end of a directive. However no suitable + preceding directive was encountered. Typically this + indicates either a extra '>' in the grammar, or a + problem inside the previous directive."); + return; + } + my $op = pop @{$self->{op}}; + my $span = @{$self->{items}} - $op->{offset}; + if ($op->{type} =~ /left|right/) + { + if ($span != 3) + { + Parse::RecDescent::_error( + "Incorrect <$op->{type}op:...> specification: + expected 3 args, but found $span instead", $line); + Parse::RecDescent::_hint( + "The <$op->{type}op:...> directive requires a + sequence of exactly three elements. For example: + <$op->{type}op:leftarg /op/ rightarg>"); + } + else + { + push @{$self->{items}}, + Parse::RecDescent::Operator->new( + $op->{type}, $minrep, $maxrep, splice(@{$self->{"items"}}, -3)); + $self->{items}[-1]->sethashname($self); + $self->{items}[-1]{name} = $op->{name}; + } + } +} + +sub prevwasreturn +{ + my ( $self, $line ) = @_; + unless (@{$self->{items}}) + { + Parse::RecDescent::_error( + "Incorrect specification: + expected item missing", $line); + Parse::RecDescent::_hint( + "The directive requires a + sequence of at least one item. For example: + "); + return; + } + push @{$self->{items}}, + Parse::RecDescent::Result->new(); +} + +sub additem +{ + my ( $self, $item ) = @_; + $item->sethashname($self); + push @{$self->{"items"}}, $item; + return $item; +} + + +sub preitempos +{ + return q + { + push @itempos, {'offset' => {'from'=>$thisoffset, 'to'=>undef}, + 'line' => {'from'=>$thisline, 'to'=>undef}, + 'column' => {'from'=>$thiscolumn, 'to'=>undef} }; + } +} + +sub incitempos +{ + return q + { + $itempos[$#itempos]{'offset'}{'from'} += length($1); + $itempos[$#itempos]{'line'}{'from'} = $thisline; + $itempos[$#itempos]{'column'}{'from'} = $thiscolumn; + } +} + +sub postitempos +{ + return q + { + $itempos[$#itempos]{'offset'}{'to'} = $prevoffset; + $itempos[$#itempos]{'line'}{'to'} = $prevline; + $itempos[$#itempos]{'column'}{'to'} = $prevcolumn; + } +} + +sub code($$$$) +{ + my ($self,$namespace,$rule,$parser) = @_; + my $code = +' + while (!$_matched' + . (defined $self->{"uncommit"} ? '' : ' && !$commit') + . ') + { + ' . + ($self->changesskip() + ? 'local $skip = defined($skip) ? $skip : $Parse::RecDescent::skip;' + : '') .' + Parse::RecDescent::_trace(q{Trying production: [' + . $self->describe . ']}, + Parse::RecDescent::_tracefirst($_[1]), + q{' . $rule ->{name}. '}, + $tracelevel) + if defined $::RD_TRACE; + my $thisprod = $thisrule->{"prods"}[' . $self->{"number"} . ']; + ' . (defined $self->{"error"} ? '' : '$text = $_[1];' ) . ' + my $_savetext; + @item = (q{' . $rule->{"name"} . '}); + %item = (__RULE__ => q{' . $rule->{"name"} . '}); + my $repcount = 0; + +'; + $code .= +' my @itempos = ({}); +' if $parser->{_check}{itempos}; + + my $item; + my $i; + + for ($i = 0; $i < @{$self->{"items"}}; $i++) + { + $item = ${$self->{items}}[$i]; + + $code .= preitempos() if $parser->{_check}{itempos}; + + $code .= $item->code($namespace,$rule,$parser->{_check}); + + $code .= postitempos() if $parser->{_check}{itempos}; + + } + + if ($parser->{_AUTOACTION} && defined($item) && !$item->isa("Parse::RecDescent::Action")) + { + $code .= $parser->{_AUTOACTION}->code($namespace,$rule); + Parse::RecDescent::_warn(1,"Autogenerating action in rule + \"$rule->{name}\": + $parser->{_AUTOACTION}{code}") + and + Parse::RecDescent::_hint("The \$::RD_AUTOACTION was defined, + so any production not ending in an + explicit action has the specified + \"auto-action\" automatically + appended."); + } + elsif ($parser->{_AUTOTREE} && defined($item) && !$item->isa("Parse::RecDescent::Action")) + { + if ($i==1 && $item->isterminal) + { + $code .= $parser->{_AUTOTREE}{TERMINAL}->code($namespace,$rule); + } + else + { + $code .= $parser->{_AUTOTREE}{NODE}->code($namespace,$rule); + } + Parse::RecDescent::_warn(1,"Autogenerating tree-building action in rule + \"$rule->{name}\"") + and + Parse::RecDescent::_hint("The directive was specified, + so any production not ending + in an explicit action has + some parse-tree building code + automatically appended."); + } + + $code .= +' + + Parse::RecDescent::_trace(q{>>Matched production: [' + . $self->describe . ']<<}, + Parse::RecDescent::_tracefirst($text), + q{' . $rule->{name} . '}, + $tracelevel) + if defined $::RD_TRACE; + $_matched = 1; + last; + } + +'; + return $code; +} + +1; + +package Parse::RecDescent::Action; + +sub describe { undef } + +sub sethashname { $_[0]->{hashname} = '__ACTION' . ++$_[1]->{actcount} .'__'; } + +sub new +{ + my $class = ref($_[0]) || $_[0]; + bless + { + "code" => $_[1], + "lookahead" => $_[2], + "line" => $_[3], + }, $class; +} + +sub issubrule { undef } +sub isterminal { 0 } + +sub code($$$$) +{ + my ($self, $namespace, $rule) = @_; + +' + Parse::RecDescent::_trace(q{Trying action}, + Parse::RecDescent::_tracefirst($text), + q{' . $rule->{name} . '}, + $tracelevel) + if defined $::RD_TRACE; + ' . ($self->{"lookahead"} ? '$_savetext = $text;' : '' ) .' + + $_tok = ($_noactions) ? 0 : do ' . $self->{"code"} . '; + ' . ($self->{"lookahead"}<0?'if':'unless') . ' (defined $_tok) + { + Parse::RecDescent::_trace(q{<> (return value: [undef])}) + if defined $::RD_TRACE; + last; + } + Parse::RecDescent::_trace(q{>>Matched action<< (return value: [} + . $_tok . q{])}, + Parse::RecDescent::_tracefirst($text)) + if defined $::RD_TRACE; + push @item, $_tok; + ' . ($self->{line}>=0 ? '$item{'. $self->{hashname} .'}=$_tok;' : '' ) .' + ' . ($self->{"lookahead"} ? '$text = $_savetext;' : '' ) .' +' +} + + +1; + +package Parse::RecDescent::Directive; + +sub sethashname { $_[0]->{hashname} = '__DIRECTIVE' . ++$_[1]->{dircount} . '__'; } + +sub issubrule { undef } +sub isterminal { 0 } +sub describe { $_[1] ? '' : $_[0]->{name} } + +sub new ($$$$$) +{ + my $class = ref($_[0]) || $_[0]; + bless + { + "code" => $_[1], + "lookahead" => $_[2], + "line" => $_[3], + "name" => $_[4], + }, $class; +} + +sub code($$$$) +{ + my ($self, $namespace, $rule) = @_; + +' + ' . ($self->{"lookahead"} ? '$_savetext = $text;' : '' ) .' + + Parse::RecDescent::_trace(q{Trying directive: [' + . $self->describe . ']}, + Parse::RecDescent::_tracefirst($text), + q{' . $rule->{name} . '}, + $tracelevel) + if defined $::RD_TRACE; ' .' + $_tok = do { ' . $self->{"code"} . ' }; + if (defined($_tok)) + { + Parse::RecDescent::_trace(q{>>Matched directive<< (return value: [} + . $_tok . q{])}, + Parse::RecDescent::_tracefirst($text)) + if defined $::RD_TRACE; + } + else + { + Parse::RecDescent::_trace(q{<>}, + Parse::RecDescent::_tracefirst($text)) + if defined $::RD_TRACE; + } + ' . ($self->{"lookahead"} ? '$text = $_savetext and ' : '' ) .' + last ' + . ($self->{"lookahead"}<0?'if':'unless') . ' defined $_tok; + push @item, $item{'.$self->{hashname}.'}=$_tok; + ' . ($self->{"lookahead"} ? '$text = $_savetext;' : '' ) .' +' +} + +1; + +package Parse::RecDescent::UncondReject; + +sub issubrule { undef } +sub isterminal { 0 } +sub describe { $_[1] ? '' : $_[0]->{name} } +sub sethashname { $_[0]->{hashname} = '__DIRECTIVE' . ++$_[1]->{dircount} . '__'; } + +sub new ($$$;$) +{ + my $class = ref($_[0]) || $_[0]; + bless + { + "lookahead" => $_[1], + "line" => $_[2], + "name" => $_[3], + }, $class; +} + +# MARK, YOU MAY WANT TO OPTIMIZE THIS. + + +sub code($$$$) +{ + my ($self, $namespace, $rule) = @_; + +' + Parse::RecDescent::_trace(q{>>Rejecting production<< (found ' + . $self->describe . ')}, + Parse::RecDescent::_tracefirst($text), + q{' . $rule->{name} . '}, + $tracelevel) + if defined $::RD_TRACE; + undef $return; + ' . ($self->{"lookahead"} ? '$_savetext = $text;' : '' ) .' + + $_tok = undef; + ' . ($self->{"lookahead"} ? '$text = $_savetext and ' : '' ) .' + last ' + . ($self->{"lookahead"}<0?'if':'unless') . ' defined $_tok; +' +} + +1; + +package Parse::RecDescent::Error; + +sub issubrule { undef } +sub isterminal { 0 } +sub describe { $_[1] ? '' : $_[0]->{commitonly} ? '' : '' } +sub sethashname { $_[0]->{hashname} = '__DIRECTIVE' . ++$_[1]->{dircount} . '__'; } + +sub new ($$$$$) +{ + my $class = ref($_[0]) || $_[0]; + bless + { + "msg" => $_[1], + "lookahead" => $_[2], + "commitonly" => $_[3], + "line" => $_[4], + }, $class; +} + +sub code($$$$) +{ + my ($self, $namespace, $rule) = @_; + + my $action = ''; + + if ($self->{"msg"}) # ERROR MESSAGE SUPPLIED + { + #WAS: $action .= "Parse::RecDescent::_error(qq{$self->{msg}}" . ',$thisline);'; + $action .= 'push @{$thisparser->{errors}}, [qq{'.$self->{msg}.'},$thisline];'; + + } + else # GENERATE ERROR MESSAGE DURING PARSE + { + $action .= ' + my $rule = $item[0]; + $rule =~ s/_/ /g; + #WAS: Parse::RecDescent::_error("Invalid $rule: " . $expectation->message() ,$thisline); + push @{$thisparser->{errors}}, ["Invalid $rule: " . $expectation->message() ,$thisline]; + '; + } + + my $dir = + new Parse::RecDescent::Directive('if (' . + ($self->{"commitonly"} ? '$commit' : '1') . + ") { do {$action} unless ".' $_noactions; undef } else {0}', + $self->{"lookahead"},0,$self->describe); + $dir->{hashname} = $self->{hashname}; + return $dir->code($namespace, $rule, 0); +} + +1; + +package Parse::RecDescent::Token; + +sub sethashname { $_[0]->{hashname} = '__PATTERN' . ++$_[1]->{patcount} . '__'; } + +sub issubrule { undef } +sub isterminal { 1 } +sub describe ($) { shift->{'description'}} + + +# ARGS ARE: $self, $pattern, $left_delim, $modifiers, $lookahead, $linenum +sub new ($$$$$$) +{ + my $class = ref($_[0]) || $_[0]; + my $pattern = $_[1]; + my $pat = $_[1]; + my $ldel = $_[2]; + my $rdel = $ldel; + $rdel =~ tr/{[(/; + + my $mod = $_[3]; + + my $desc; + + if ($ldel eq '/') { $desc = "$ldel$pattern$rdel$mod" } + else { $desc = "m$ldel$pattern$rdel$mod" } + $desc =~ s/\\/\\\\/g; + $desc =~ s/\$$/\\\$/g; + $desc =~ s/}/\\}/g; + $desc =~ s/{/\\{/g; + + if (!eval "no strict; + local \$SIG{__WARN__} = sub {0}; + '' =~ m$ldel$pattern$rdel" and $@) + { + Parse::RecDescent::_warn(3, "Token pattern \"m$ldel$pattern$rdel\" + may not be a valid regular expression", + $_[5]); + $@ =~ s/ at \(eval.*/./; + Parse::RecDescent::_hint($@); + } + + # QUIETLY PREVENT (WELL-INTENTIONED) CALAMITY + $mod =~ s/[gc]//g; + $pattern =~ s/(\A|[^\\])\\G/$1/g; + + bless + { + "pattern" => $pattern, + "ldelim" => $ldel, + "rdelim" => $rdel, + "mod" => $mod, + "lookahead" => $_[4], + "line" => $_[5], + "description" => $desc, + }, $class; +} + + +sub code($$$$) +{ + my ($self, $namespace, $rule, $check) = @_; + my $ldel = $self->{"ldelim"}; + my $rdel = $self->{"rdelim"}; + my $sdel = $ldel; + my $mod = $self->{"mod"}; + + $sdel =~ s/[[{(<]/{}/; + +my $code = ' + Parse::RecDescent::_trace(q{Trying terminal: [' . $self->describe + . ']}, Parse::RecDescent::_tracefirst($text), + q{' . $rule->{name} . '}, + $tracelevel) + if defined $::RD_TRACE; + $lastsep = ""; + $expectation->is(q{' . ($rule->hasleftmost($self) ? '' + : $self->describe ) . '})->at($text); + ' . ($self->{"lookahead"} ? '$_savetext = $text;' : '' ) . ' + + ' . ($self->{"lookahead"}<0?'if':'unless') + . ' ($text =~ s/\A($skip)/$lastsep=$1 and ""/e and ' + . ($check->{itempos}? 'do {'.Parse::RecDescent::Production::incitempos().' 1} and ' : '') + . ' $text =~ s' . $ldel . '\A(?:' . $self->{"pattern"} . ')' + . $rdel . $sdel . $mod . ') + { + '.($self->{"lookahead"} ? '$text = $_savetext;' : '').' + $expectation->failed(); + Parse::RecDescent::_trace(q{<>}, + Parse::RecDescent::_tracefirst($text)) + if defined $::RD_TRACE; + + last; + } + Parse::RecDescent::_trace(q{>>Matched terminal<< (return value: [} + . $& . q{])}, + Parse::RecDescent::_tracefirst($text)) + if defined $::RD_TRACE; + push @item, $item{'.$self->{hashname}.'}=$&; + ' . ($self->{"lookahead"} ? '$text = $_savetext;' : '' ) .' +'; + + return $code; +} + +1; + +package Parse::RecDescent::Literal; + +sub sethashname { $_[0]->{hashname} = '__STRING' . ++$_[1]->{strcount} . '__'; } + +sub issubrule { undef } +sub isterminal { 1 } +sub describe ($) { shift->{'description'} } + +sub new ($$$$) +{ + my $class = ref($_[0]) || $_[0]; + + my $pattern = $_[1]; + + my $desc = $pattern; + $desc=~s/\\/\\\\/g; + $desc=~s/}/\\}/g; + $desc=~s/{/\\{/g; + + bless + { + "pattern" => $pattern, + "lookahead" => $_[2], + "line" => $_[3], + "description" => "'$desc'", + }, $class; +} + + +sub code($$$$) +{ + my ($self, $namespace, $rule, $check) = @_; + +my $code = ' + Parse::RecDescent::_trace(q{Trying terminal: [' . $self->describe + . ']}, + Parse::RecDescent::_tracefirst($text), + q{' . $rule->{name} . '}, + $tracelevel) + if defined $::RD_TRACE; + $lastsep = ""; + $expectation->is(q{' . ($rule->hasleftmost($self) ? '' + : $self->describe ) . '})->at($text); + ' . ($self->{"lookahead"} ? '$_savetext = $text;' : '' ) . ' + + ' . ($self->{"lookahead"}<0?'if':'unless') + . ' ($text =~ s/\A($skip)/$lastsep=$1 and ""/e and ' + . ($check->{itempos}? 'do {'.Parse::RecDescent::Production::incitempos().' 1} and ' : '') + . ' $text =~ s/\A' . quotemeta($self->{"pattern"}) . '//) + { + '.($self->{"lookahead"} ? '$text = $_savetext;' : '').' + $expectation->failed(); + Parse::RecDescent::_trace(qq{<>}, + Parse::RecDescent::_tracefirst($text)) + if defined $::RD_TRACE; + last; + } + Parse::RecDescent::_trace(q{>>Matched terminal<< (return value: [} + . $& . q{])}, + Parse::RecDescent::_tracefirst($text)) + if defined $::RD_TRACE; + push @item, $item{'.$self->{hashname}.'}=$&; + ' . ($self->{"lookahead"} ? '$text = $_savetext;' : '' ) .' +'; + + return $code; +} + +1; + +package Parse::RecDescent::InterpLit; + +sub sethashname { $_[0]->{hashname} = '__STRING' . ++$_[1]->{strcount} . '__'; } + +sub issubrule { undef } +sub isterminal { 1 } +sub describe ($) { shift->{'description'} } + +sub new ($$$$) +{ + my $class = ref($_[0]) || $_[0]; + + my $pattern = $_[1]; + $pattern =~ s#/#\\/#g; + + my $desc = $pattern; + $desc=~s/\\/\\\\/g; + $desc=~s/}/\\}/g; + $desc=~s/{/\\{/g; + + bless + { + "pattern" => $pattern, + "lookahead" => $_[2], + "line" => $_[3], + "description" => "'$desc'", + }, $class; +} + +sub code($$$$) +{ + my ($self, $namespace, $rule, $check) = @_; + +my $code = ' + Parse::RecDescent::_trace(q{Trying terminal: [' . $self->describe + . ']}, + Parse::RecDescent::_tracefirst($text), + q{' . $rule->{name} . '}, + $tracelevel) + if defined $::RD_TRACE; + $lastsep = ""; + $expectation->is(q{' . ($rule->hasleftmost($self) ? '' + : $self->describe ) . '})->at($text); + ' . ($self->{"lookahead"} ? '$_savetext = $text;' : '' ) . ' + + ' . ($self->{"lookahead"}<0?'if':'unless') + . ' ($text =~ s/\A($skip)/$lastsep=$1 and ""/e and ' + . ($check->{itempos}? 'do {'.Parse::RecDescent::Production::incitempos().' 1} and ' : '') + . ' do { $_tok = "' . $self->{"pattern"} . '"; 1 } and + substr($text,0,length($_tok)) eq $_tok and + do { substr($text,0,length($_tok)) = ""; 1; } + ) + { + '.($self->{"lookahead"} ? '$text = $_savetext;' : '').' + $expectation->failed(); + Parse::RecDescent::_trace(q{<>}, + Parse::RecDescent::_tracefirst($text)) + if defined $::RD_TRACE; + last; + } + Parse::RecDescent::_trace(q{>>Matched terminal<< (return value: [} + . $_tok . q{])}, + Parse::RecDescent::_tracefirst($text)) + if defined $::RD_TRACE; + push @item, $item{'.$self->{hashname}.'}=$_tok; + ' . ($self->{"lookahead"} ? '$text = $_savetext;' : '' ) .' +'; + + return $code; +} + +1; + +package Parse::RecDescent::Subrule; + +sub issubrule ($) { return $_[0]->{"subrule"} } +sub isterminal { 0 } +sub sethashname {} + +sub describe ($) +{ + my $desc = $_[0]->{"implicit"} || $_[0]->{"subrule"}; + $desc = "" if $_[0]->{"matchrule"}; + return $desc; +} + +sub callsyntax($$) +{ + if ($_[0]->{"matchrule"}) + { + return "&{'$_[1]'.qq{$_[0]->{subrule}}}"; + } + else + { + return $_[1].$_[0]->{"subrule"}; + } +} + +sub new ($$$$;$$$) +{ + my $class = ref($_[0]) || $_[0]; + bless + { + "subrule" => $_[1], + "lookahead" => $_[2], + "line" => $_[3], + "implicit" => $_[4] || undef, + "matchrule" => $_[5], + "argcode" => $_[6] || undef, + }, $class; +} + + +sub code($$$$) +{ + my ($self, $namespace, $rule) = @_; + +' + Parse::RecDescent::_trace(q{Trying subrule: [' . $self->{"subrule"} . ']}, + Parse::RecDescent::_tracefirst($text), + q{' . $rule->{"name"} . '}, + $tracelevel) + if defined $::RD_TRACE; + if (1) { no strict qw{refs}; + $expectation->is(' . ($rule->hasleftmost($self) ? 'q{}' + # WAS : 'qq{'.$self->describe.'}' ) . ')->at($text); + : 'q{'.$self->describe.'}' ) . ')->at($text); + ' . ($self->{"lookahead"} ? '$_savetext = $text;' : '' ) + . ($self->{"lookahead"}<0?'if':'unless') + . ' (defined ($_tok = ' + . $self->callsyntax($namespace.'::') + . '($thisparser,$text,$repeating,' + . ($self->{"lookahead"}?'1':'$_noactions') + . ($self->{argcode} ? ",sub { return $self->{argcode} }" + : ',sub { \\@arg }') + . '))) + { + '.($self->{"lookahead"} ? '$text = $_savetext;' : '').' + Parse::RecDescent::_trace(q{<{subrule} . ']>>}, + Parse::RecDescent::_tracefirst($text), + q{' . $rule->{"name"} .'}, + $tracelevel) + if defined $::RD_TRACE; + $expectation->failed(); + last; + } + Parse::RecDescent::_trace(q{>>Matched subrule: [' + . $self->{subrule} . ']<< (return value: [} + . $_tok . q{]}, + + Parse::RecDescent::_tracefirst($text), + q{' . $rule->{"name"} .'}, + $tracelevel) + if defined $::RD_TRACE; + $item{q{' . $self->{subrule} . '}} = $_tok; + push @item, $_tok; + ' . ($self->{"lookahead"} ? '$text = $_savetext;' : '' ) .' + } +' +} + +package Parse::RecDescent::Repetition; + +sub issubrule ($) { return $_[0]->{"subrule"} } +sub isterminal { 0 } +sub sethashname { } + +sub describe ($) +{ + my $desc = $_[0]->{"expected"} || $_[0]->{"subrule"}; + $desc = "" if $_[0]->{"matchrule"}; + return $desc; +} + +sub callsyntax($$) +{ + if ($_[0]->{matchrule}) + { return "sub { goto &{''.qq{$_[1]$_[0]->{subrule}}} }"; } + else + { return "\\&$_[1]$_[0]->{subrule}"; } +} + +sub new ($$$$$$$$$$) +{ + my ($self, $subrule, $repspec, $min, $max, $lookahead, $line, $parser, $matchrule, $argcode) = @_; + my $class = ref($self) || $self; + ($max, $min) = ( $min, $max) if ($max<$min); + + my $desc; + if ($subrule=~/\A_alternation_\d+_of_production_\d+_of_rule/) + { $desc = $parser->{"rules"}{$subrule}->expected } + + if ($lookahead) + { + if ($min>0) + { + return new Parse::RecDescent::Subrule($subrule,$lookahead,$line,$desc,$matchrule,$argcode); + } + else + { + Parse::RecDescent::_error("Not symbol (\"!\") before + \"$subrule\" doesn't make + sense.",$line); + Parse::RecDescent::_hint("Lookahead for negated optional + repetitions (such as + \"!$subrule($repspec)\" can never + succeed, since optional items always + match (zero times at worst). + Did you mean a single \"!$subrule\", + instead?"); + } + } + bless + { + "subrule" => $subrule, + "repspec" => $repspec, + "min" => $min, + "max" => $max, + "lookahead" => $lookahead, + "line" => $line, + "expected" => $desc, + "argcode" => $argcode || undef, + "matchrule" => $matchrule, + }, $class; +} + +sub code($$$$) +{ + my ($self, $namespace, $rule) = @_; + + my ($subrule, $repspec, $min, $max, $lookahead) = + @{$self}{ qw{subrule repspec min max lookahead} }; + +' + Parse::RecDescent::_trace(q{Trying repeated subrule: [' . $self->describe . ']}, + Parse::RecDescent::_tracefirst($text), + q{' . $rule->{"name"} . '}, + $tracelevel) + if defined $::RD_TRACE; + $expectation->is(' . ($rule->hasleftmost($self) ? 'q{}' + # WAS : 'qq{'.$self->describe.'}' ) . ')->at($text); + : 'q{'.$self->describe.'}' ) . ')->at($text); + ' . ($self->{"lookahead"} ? '$_savetext = $text;' : '' ) .' + unless (defined ($_tok = $thisparser->_parserepeat($text, ' + . $self->callsyntax($namespace.'::') + . ', ' . $min . ', ' . $max . ', ' + . ($self->{"lookahead"}?'1':'$_noactions') + . ',$expectation,' + . ($self->{argcode} ? "sub { return $self->{argcode} }" + : 'undef') + . '))) + { + Parse::RecDescent::_trace(q{<describe . ']>>}, + Parse::RecDescent::_tracefirst($text), + q{' . $rule->{"name"} .'}, + $tracelevel) + if defined $::RD_TRACE; + last; + } + Parse::RecDescent::_trace(q{>>Matched repeated subrule: [' + . $self->{subrule} . ']<< (} + . @$_tok . q{ times)}, + + Parse::RecDescent::_tracefirst($text), + q{' . $rule->{"name"} .'}, + $tracelevel) + if defined $::RD_TRACE; + $item{q{' . "$self->{subrule}($self->{repspec})" . '}} = $_tok; + push @item, $_tok; + ' . ($self->{"lookahead"} ? '$text = $_savetext;' : '' ) .' + +' +} + +package Parse::RecDescent::Result; + +sub issubrule { 0 } +sub isterminal { 0 } +sub describe { '' } + +sub new +{ + my ($class, $pos) = @_; + + bless {}, $class; +} + +sub code($$$$) +{ + my ($self, $namespace, $rule) = @_; + + ' + $return = $item[-1]; + '; +} + +package Parse::RecDescent::Operator; + +my @opertype = ( " non-optional", "n optional" ); + +sub issubrule { 0 } +sub isterminal { 0 } + +sub describe { $_[0]->{"expected"} } +sub sethashname { $_[0]->{hashname} = '__DIRECTIVE' . ++$_[1]->{dircount} . '__'; } + + +sub new +{ + my ($class, $type, $minrep, $maxrep, $leftarg, $op, $rightarg) = @_; + + bless + { + "type" => "${type}op", + "leftarg" => $leftarg, + "op" => $op, + "min" => $minrep, + "max" => $maxrep, + "rightarg" => $rightarg, + "expected" => "<${type}op: ".$leftarg->describe." ".$op->describe." ".$rightarg->describe.">", + }, $class; +} + +sub code($$$$) +{ + my ($self, $namespace, $rule) = @_; + + my ($leftarg, $op, $rightarg) = + @{$self}{ qw{leftarg op rightarg} }; + + my $code = ' + Parse::RecDescent::_trace(q{Trying operator: [' . $self->describe . ']}, + Parse::RecDescent::_tracefirst($text), + q{' . $rule->{"name"} . '}, + $tracelevel) + if defined $::RD_TRACE; + $expectation->is(' . ($rule->hasleftmost($self) ? 'q{}' + # WAS : 'qq{'.$self->describe.'}' ) . ')->at($text); + : 'q{'.$self->describe.'}' ) . ')->at($text); + + $_tok = undef; + OPLOOP: while (1) + { + $repcount = 0; + my @item; + '; + + if ($self->{type} eq "leftop" ) + { + $code .= ' + # MATCH LEFTARG + ' . $leftarg->code(@_[1..2]) . ' + + $repcount++; + + my $savetext = $text; + my $backtrack; + + # MATCH (OP RIGHTARG)(s) + while ($repcount < ' . $self->{max} . ') + { + $backtrack = 0; + ' . $op->code(@_[1..2]) . ' + ' . ($op->isterminal() ? 'pop @item;' : '$backtrack=1;' ) . ' + ' . (ref($op) eq 'Parse::RecDescent::Token' + ? 'if (defined $1) {push @item, $item{'.($self->{name}||$self->{hashname}).'}=$1; $backtrack=1;}' + : "" ) . ' + ' . $rightarg->code(@_[1..2]) . ' + $savetext = $text; + $repcount++; + } + $text = $savetext; + pop @item if $backtrack; + + '; + } + else + { + $code .= ' + my $savetext = $text; + my $backtrack; + # MATCH (LEFTARG OP)(s) + while ($repcount < ' . $self->{max} . ') + { + $backtrack = 0; + ' . $leftarg->code(@_[1..2]) . ' + $repcount++; + $backtrack = 1; + ' . $op->code(@_[1..2]) . ' + $savetext = $text; + ' . ($op->isterminal() ? 'pop @item;' : "" ) . ' + ' . (ref($op) eq 'Parse::RecDescent::Token' ? 'do { push @item, $item{'.($self->{name}||$self->{hashname}).'}=$1; } if defined $1;' : "" ) . ' + } + $text = $savetext; + pop @item if $backtrack; + + # MATCH RIGHTARG + ' . $rightarg->code(@_[1..2]) . ' + $repcount++; + '; + } + + $code .= 'unless (@item) { undef $_tok; last }' unless $self->{min}==0; + + $code .= ' + $_tok = [ @item ]; + last; + } + + unless ($repcount>='.$self->{min}.') + { + Parse::RecDescent::_trace(q{<describe + . ']>>}, + Parse::RecDescent::_tracefirst($text), + q{' . $rule->{"name"} .'}, + $tracelevel) + if defined $::RD_TRACE; + $expectation->failed(); + last; + } + Parse::RecDescent::_trace(q{>>Matched operator: [' + . $self->describe + . ']<< (return value: [} + . qq{@{$_tok||[]}} . q{]}, + Parse::RecDescent::_tracefirst($text), + q{' . $rule->{"name"} .'}, + $tracelevel) + if defined $::RD_TRACE; + + push @item, $item{'.($self->{name}||$self->{hashname}).'}=$_tok||[]; + +'; + return $code; +} + + +package Parse::RecDescent::Expectation; + +sub new ($) +{ + bless { + "failed" => 0, + "expected" => "", + "unexpected" => "", + "lastexpected" => "", + "lastunexpected" => "", + "defexpected" => $_[1], + }; +} + +sub is ($$) +{ + $_[0]->{lastexpected} = $_[1]; return $_[0]; +} + +sub at ($$) +{ + $_[0]->{lastunexpected} = $_[1]; return $_[0]; +} + +sub failed ($) +{ + return unless $_[0]->{lastexpected}; + $_[0]->{expected} = $_[0]->{lastexpected} unless $_[0]->{failed}; + $_[0]->{unexpected} = $_[0]->{lastunexpected} unless $_[0]->{failed}; + $_[0]->{failed} = 1; +} + +sub message ($) +{ + my ($self) = @_; + $self->{expected} = $self->{defexpected} unless $self->{expected}; + $self->{expected} =~ s/_/ /g; + if (!$self->{unexpected} || $self->{unexpected} =~ /\A\s*\Z/s) + { + return "Was expecting $self->{expected}"; + } + else + { + $self->{unexpected} =~ /\s*(.*)/; + return "Was expecting $self->{expected} but found \"$1\" instead"; + } +} + +1; + +package Parse::RecDescent; + +use Carp; +use vars qw ( $AUTOLOAD $VERSION ); + +my $ERRORS = 0; + +$VERSION = '1.94'; + +# BUILDING A PARSER + +my $nextnamespace = "namespace000001"; + +sub _nextnamespace() +{ + return "Parse::RecDescent::" . $nextnamespace++; +} + +sub new ($$$) +{ + my $class = ref($_[0]) || $_[0]; + local $Parse::RecDescent::compiling = $_[2]; + my $name_space_name = defined $_[3] + ? "Parse::RecDescent::".$_[3] + : _nextnamespace(); + my $self = + { + "rules" => {}, + "namespace" => $name_space_name, + "startcode" => '', + "localvars" => '', + "_AUTOACTION" => undef, + "_AUTOTREE" => undef, + }; + if ($::RD_AUTOACTION) + { + my $sourcecode = $::RD_AUTOACTION; + $sourcecode = "{ $sourcecode }" + unless $sourcecode =~ /\A\s*\{.*\}\s*\Z/; + $self->{_check}{itempos} = + $sourcecode =~ /\@itempos\b|\$itempos\s*\[/; + $self->{_AUTOACTION} + = new Parse::RecDescent::Action($sourcecode,0,-1) + } + + bless $self, $class; + shift; + return $self->Replace(@_) +} + +sub Compile($$$$) { + + die "Compilation of Parse::RecDescent grammars not yet implemented\n"; +} + +sub DESTROY {} # SO AUTOLOADER IGNORES IT + +# BUILDING A GRAMMAR.... + +sub Replace ($$) +{ + splice(@_, 2, 0, 1); + return _generate(@_); +} + +sub Extend ($$) +{ + splice(@_, 2, 0, 0); + return _generate(@_); +} + +sub _no_rule ($$;$) +{ + _error("Ruleless $_[0] at start of grammar.",$_[1]); + my $desc = $_[2] ? "\"$_[2]\"" : ""; + _hint("You need to define a rule for the $_[0] $desc + to be part of."); +} + +my $NEGLOOKAHEAD = '\G(\s*\.\.\.\!)'; +my $POSLOOKAHEAD = '\G(\s*\.\.\.)'; +my $RULE = '\G\s*(\w+)[ \t]*:'; +my $PROD = '\G\s*([|])'; +my $TOKEN = q{\G\s*/((\\\\/|[^/])*)/([cgimsox]*)}; +my $MTOKEN = q{\G\s*(m\s*[^\w\s])}; +my $LITERAL = q{\G\s*'((\\\\['\\\\]|[^'])*)'}; +my $INTERPLIT = q{\G\s*"((\\\\["\\\\]|[^"])*)"}; +my $SUBRULE = '\G\s*(\w+)'; +my $MATCHRULE = '\G(\s*{_check}{itempos} = ($grammar =~ /\@itempos\b|\$itempos\s*\[/) + unless $self->{_check}{itempos}; + for (qw(thisoffset thiscolumn prevline prevoffset prevcolumn)) + { + $self->{_check}{$_} = + ($grammar =~ /\$$_/) || $self->{_check}{itempos} + unless $self->{_check}{$_}; + } + my $line; + + my $rule = undef; + my $prod = undef; + my $item = undef; + my $lastgreedy = ''; + pos $grammar = 0; + study $grammar; + + while (pos $grammar < length $grammar) + { + $line = $lines - _linecount($grammar) + 1; + my $commitonly; + my $code = ""; + my @components = (); + if ($grammar =~ m/$COMMENT/gco) + { + _parse("a comment",0,$line); + next; + } + elsif ($grammar =~ m/$NEGLOOKAHEAD/gco) + { + _parse("a negative lookahead",$aftererror,$line); + $lookahead = $lookahead ? -$lookahead : -1; + $lookaheadspec .= $1; + next; # SKIP LOOKAHEAD RESET AT END OF while LOOP + } + elsif ($grammar =~ m/$POSLOOKAHEAD/gco) + { + _parse("a positive lookahead",$aftererror,$line); + $lookahead = $lookahead ? $lookahead : 1; + $lookaheadspec .= $1; + next; # SKIP LOOKAHEAD RESET AT END OF while LOOP + } + elsif ($grammar =~ m/(?=$ACTION)/gco + and do { ($code) = extract_codeblock($grammar); $code }) + { + _parse("an action", $aftererror, $line, $code); + $item = new Parse::RecDescent::Action($code,$lookahead,$line); + $prod and $prod->additem($item) + or $self->_addstartcode($code); + } + elsif ($grammar =~ m/(?=$IMPLICITSUBRULE)/gco + and do { ($code) = extract_codeblock($grammar,'{([',undef,'(',1); + $code }) + { + $code =~ s/\A\s*\(|\)\Z//g; + _parse("an implicit subrule", $aftererror, $line, + "( $code )"); + my $implicit = $rule->nextimplicit; + $self->_generate("$implicit : $code",$replace,1); + my $pos = pos $grammar; + substr($grammar,$pos,0,$implicit); + pos $grammar = $pos;; + } + elsif ($grammar =~ m/$ENDDIRECTIVEMK/gco) + { + + # EXTRACT TRAILING REPETITION SPECIFIER (IF ANY) + + my ($minrep,$maxrep) = (1,$MAXREP); + if ($grammar =~ m/\G[(]/gc) + { + pos($grammar)--; + + if ($grammar =~ m/$OPTIONAL/gco) + { ($minrep, $maxrep) = (0,1) } + elsif ($grammar =~ m/$ANY/gco) + { $minrep = 0 } + elsif ($grammar =~ m/$EXACTLY/gco) + { ($minrep, $maxrep) = ($1,$1) } + elsif ($grammar =~ m/$BETWEEN/gco) + { ($minrep, $maxrep) = ($1,$2) } + elsif ($grammar =~ m/$ATLEAST/gco) + { $minrep = $1 } + elsif ($grammar =~ m/$ATMOST/gco) + { $maxrep = $1 } + elsif ($grammar =~ m/$MANY/gco) + { } + elsif ($grammar =~ m/$BADREP/gco) + { + _parse("an invalid repetition specifier", 0,$line); + _error("Incorrect specification of a repeated directive", + $line); + _hint("Repeated directives cannot have + a maximum repetition of zero, nor can they have + negative components in their ranges."); + } + } + + $prod && $prod->enddirective($line,$minrep,$maxrep); + } + elsif ($grammar =~ m/\G\s*<[^m]/gc) + { + pos($grammar)-=2; + + if ($grammar =~ m/$OPMK/gco) + { + # $DB::single=1; + _parse("a $1-associative operator directive", $aftererror, $line, "<$1op:...>"); + $prod->adddirective($1, $line,$2||''); + } + elsif ($grammar =~ m/$UNCOMMITMK/gco) + { + _parse("an uncommit marker", $aftererror,$line); + $item = new Parse::RecDescent::Directive('$commit=0;1', + $lookahead,$line,""); + $prod and $prod->additem($item) + or _no_rule("",$line); + } + elsif ($grammar =~ m/$QUOTELIKEMK/gco) + { + _parse("an perl quotelike marker", $aftererror,$line); + $item = new Parse::RecDescent::Directive( + 'my ($match,@res); + ($match,$text,undef,@res) = + Text::Balanced::extract_quotelike($text,$skip); + $match ? \@res : undef; + ', $lookahead,$line,""); + $prod and $prod->additem($item) + or _no_rule("",$line); + } + elsif ($grammar =~ m/$CODEBLOCKMK/gco) + { + my $outer = $1||"{}"; + _parse("an perl codeblock marker", $aftererror,$line); + $item = new Parse::RecDescent::Directive( + 'Text::Balanced::extract_codeblock($text,undef,$skip,\''.$outer.'\'); + ', $lookahead,$line,""); + $prod and $prod->additem($item) + or _no_rule("",$line); + } + elsif ($grammar =~ m/$VARIABLEMK/gco) + { + _parse("an perl variable marker", $aftererror,$line); + $item = new Parse::RecDescent::Directive( + 'Text::Balanced::extract_variable($text,$skip); + ', $lookahead,$line,""); + $prod and $prod->additem($item) + or _no_rule("",$line); + } + elsif ($grammar =~ m/$NOCHECKMK/gco) + { + _parse("a disable checking marker", $aftererror,$line); + if ($rule) + { + _error(" directive not at start of grammar", $line); + _hint("The directive can only + be specified at the start of a + grammar (before the first rule + is defined."); + } + else + { + local $::RD_CHECK = 1; + } + } + elsif ($grammar =~ m/$AUTOSTUBMK/gco) + { + _parse("an autostub marker", $aftererror,$line); + $::RD_AUTOSTUB = ""; + } + elsif ($grammar =~ m/$AUTORULEMK/gco) + { + _parse("an autorule marker", $aftererror,$line); + $::RD_AUTOSTUB = $1; + } + elsif ($grammar =~ m/$AUTOTREEMK/gco) + { + _parse("an autotree marker", $aftererror,$line); + if ($rule) + { + _error(" directive not at start of grammar", $line); + _hint("The directive can only + be specified at the start of a + grammar (before the first rule + is defined."); + } + else + { + undef $self->{_AUTOACTION}; + $self->{_AUTOTREE}{NODE} + = new Parse::RecDescent::Action(q{{bless \%item, $item[0]}},0,-1); + $self->{_AUTOTREE}{TERMINAL} + = new Parse::RecDescent::Action(q{{bless {__VALUE__=>$item[1]}, $item[0]}},0,-1); + } + } + + elsif ($grammar =~ m/$REJECTMK/gco) + { + _parse("an reject marker", $aftererror,$line); + $item = new Parse::RecDescent::UncondReject($lookahead,$line,""); + $prod and $prod->additem($item) + or _no_rule("",$line); + } + elsif ($grammar =~ m/(?=$CONDREJECTMK)/gco + and do { ($code) = extract_codeblock($grammar,'{',undef,'<'); + $code }) + { + _parse("a (conditional) reject marker", $aftererror,$line); + $code =~ /\A\s*\Z/s; + $item = new Parse::RecDescent::Directive( + "($1) ? undef : 1", $lookahead,$line,""); + $prod and $prod->additem($item) + or _no_rule("",$line); + } + elsif ($grammar =~ m/(?=$SCOREMK)/gco + and do { ($code) = extract_codeblock($grammar,'{',undef,'<'); + $code }) + { + _parse("a score marker", $aftererror,$line); + $code =~ /\A\s*\Z/s; + $prod and $prod->addscore($1, $lookahead, $line) + or _no_rule($code,$line); + } + elsif ($grammar =~ m/(?=$AUTOSCOREMK)/gco + and do { ($code) = extract_codeblock($grammar,'{',undef,'<'); + $code; + } ) + { + _parse("an autoscore specifier", $aftererror,$line,$code); + $code =~ /\A\s*\Z/s; + + $rule and $rule->addautoscore($1,$self) + or _no_rule($code,$line); + + $item = new Parse::RecDescent::UncondReject($lookahead,$line,$code); + $prod and $prod->additem($item) + or _no_rule($code,$line); + } + elsif ($grammar =~ m/$RESYNCMK/gco) + { + _parse("a resync to newline marker", $aftererror,$line); + $item = new Parse::RecDescent::Directive( + 'if ($text =~ s/\A[^\n]*\n//) { $return = 0; $& } else { undef }', + $lookahead,$line,""); + $prod and $prod->additem($item) + or _no_rule("",$line); + } + elsif ($grammar =~ m/(?=$RESYNCPATMK)/gco + and do { ($code) = extract_bracketed($grammar,'<'); + $code }) + { + _parse("a resync with pattern marker", $aftererror,$line); + $code =~ /\A\s*\Z/s; + $item = new Parse::RecDescent::Directive( + 'if ($text =~ s/\A'.$1.'//) { $return = 0; $& } else { undef }', + $lookahead,$line,$code); + $prod and $prod->additem($item) + or _no_rule($code,$line); + } + elsif ($grammar =~ m/(?=$SKIPMK)/gco + and do { ($code) = extract_codeblock($grammar,'<'); + $code }) + { + _parse("a skip marker", $aftererror,$line); + $code =~ /\A\s*\Z/s; + $item = new Parse::RecDescent::Directive( + 'my $oldskip = $skip; $skip='.$1.'; $oldskip', + $lookahead,$line,$code); + $prod and $prod->additem($item) + or _no_rule($code,$line); + } + elsif ($grammar =~ m/(?=$RULEVARPATMK)/gco + and do { ($code) = extract_codeblock($grammar,'{',undef,'<'); + $code; + } ) + { + _parse("a rule variable specifier", $aftererror,$line,$code); + $code =~ /\A\s*\Z/s; + + $rule and $rule->addvar($1,$self) + or _no_rule($code,$line); + + $item = new Parse::RecDescent::UncondReject($lookahead,$line,$code); + $prod and $prod->additem($item) + or _no_rule($code,$line); + } + elsif ($grammar =~ m/(?=$DEFERPATMK)/gco + and do { ($code) = extract_codeblock($grammar,'{',undef,'<'); + $code; + } ) + { + _parse("a deferred action specifier", $aftererror,$line,$code); + $code =~ s/\A\s*\Z/$1/s; + if ($code =~ /\A\s*[^{]|[^}]\s*\Z/) + { + $code = "{ $code }" + } + + $item = new Parse::RecDescent::Directive( + "push \@{\$thisparser->{deferred}}, sub $code;", + $lookahead,$line,""); + $prod and $prod->additem($item) + or _no_rule("",$line); + + $self->{deferrable} = 1; + } + elsif ($grammar =~ m/(?=$TOKENPATMK)/gco + and do { ($code) = extract_codeblock($grammar,'{',undef,'<'); + $code; + } ) + { + _parse("a token constructor", $aftererror,$line,$code); + $code =~ s/\A\s*\Z/$1/s; + + my $types = eval 'no strict; local $SIG{__WARN__} = sub {0}; my @arr=('.$code.'); @arr' || (); + if (!$types) + { + _error("Incorrect token specification: \"$@\"", $line); + _hint("The directive requires a list + of one or more strings representing possible + types of the specified token. For example: + "); + } + else + { + $item = new Parse::RecDescent::Directive( + 'no strict; + $return = { text => $item[-1] }; + @{$return->{type}}{'.$code.'} = (1..'.$types.');', + $lookahead,$line,""); + $prod and $prod->additem($item) + or _no_rule("",$line); + } + } + elsif ($grammar =~ m/$COMMITMK/gco) + { + _parse("an commit marker", $aftererror,$line); + $item = new Parse::RecDescent::Directive('$commit = 1', + $lookahead,$line,""); + $prod and $prod->additem($item) + or _no_rule("",$line); + } + elsif ($grammar =~ m/$AUTOERRORMK/gco) + { + $commitonly = $1; + _parse("an error marker", $aftererror,$line); + $item = new Parse::RecDescent::Error('',$lookahead,$1,$line); + $prod and $prod->additem($item) + or _no_rule("",$line); + $aftererror = !$commitonly; + } + elsif ($grammar =~ m/(?=$MSGERRORMK)/gco + and do { $commitonly = $1; + ($code) = extract_bracketed($grammar,'<'); + $code }) + { + _parse("an error marker", $aftererror,$line,$code); + $code =~ /\A\s*\Z/s; + $item = new Parse::RecDescent::Error($1,$lookahead,$commitonly,$line); + $prod and $prod->additem($item) + or _no_rule("$code",$line); + $aftererror = !$commitonly; + } + elsif (do { $commitonly = $1; + ($code) = extract_bracketed($grammar,'<'); + $code }) + { + if ($code =~ /^<[A-Z_]+>$/) + { + _error("Token items are not yet + supported: \"$code\"", + $line); + _hint("Items like $code that consist of angle + brackets enclosing a sequence of + uppercase characters will eventually + be used to specify pre-lexed tokens + in a grammar. That functionality is not + yet implemented. Or did you misspell + \"$code\"?"); + } + else + { + _error("Untranslatable item encountered: \"$code\"", + $line); + _hint("Did you misspell \"$code\" + or forget to comment it out?"); + } + } + } + elsif ($grammar =~ m/$RULE/gco) + { + _parseunneg("a rule declaration", 0, + $lookahead,$line) or next; + my $rulename = $1; + if ($rulename =~ /Replace|Extend|Precompile|Save/ ) + { + _warn(2,"Rule \"$rulename\" hidden by method + Parse::RecDescent::$rulename",$line) + and + _hint("The rule named \"$rulename\" cannot be directly + called through the Parse::RecDescent object + for this grammar (although it may still + be used as a subrule of other rules). + It can't be directly called because + Parse::RecDescent::$rulename is already defined (it + is the standard method of all + parsers)."); + } + $rule = new Parse::RecDescent::Rule($rulename,$self,$line,$replace); + $prod->check_pending($line) if $prod; + $prod = $rule->addprod( new Parse::RecDescent::Production ); + $aftererror = 0; + } + elsif ($grammar =~ m/$UNCOMMITPROD/gco) + { + pos($grammar)-=9; + _parseunneg("a new (uncommitted) production", + 0, $lookahead, $line) or next; + + $prod->check_pending($line) if $prod; + $prod = new Parse::RecDescent::Production($line,1); + $rule and $rule->addprod($prod) + or _no_rule("",$line); + $aftererror = 0; + } + elsif ($grammar =~ m/$ERRORPROD/gco) + { + pos($grammar)-=6; + _parseunneg("a new (error) production", $aftererror, + $lookahead,$line) or next; + $prod->check_pending($line) if $prod; + $prod = new Parse::RecDescent::Production($line,0,1); + $rule and $rule->addprod($prod) + or _no_rule("",$line); + $aftererror = 0; + } + elsif ($grammar =~ m/$PROD/gco) + { + _parseunneg("a new production", 0, + $lookahead,$line) or next; + $rule + and (!$prod || $prod->check_pending($line)) + and $prod = $rule->addprod(new Parse::RecDescent::Production($line)) + or _no_rule("production",$line); + $aftererror = 0; + } + elsif ($grammar =~ m/$LITERAL/gco) + { + ($code = $1) =~ s/\\\\/\\/g; + _parse("a literal terminal", $aftererror,$line,$1); + $item = new Parse::RecDescent::Literal($code,$lookahead,$line); + $prod and $prod->additem($item) + or _no_rule("literal terminal",$line,"'$1'"); + } + elsif ($grammar =~ m/$INTERPLIT/gco) + { + _parse("an interpolated literal terminal", $aftererror,$line); + $item = new Parse::RecDescent::InterpLit($1,$lookahead,$line); + $prod and $prod->additem($item) + or _no_rule("interpolated literal terminal",$line,"'$1'"); + } + elsif ($grammar =~ m/$TOKEN/gco) + { + _parse("a /../ pattern terminal", $aftererror,$line); + $item = new Parse::RecDescent::Token($1,'/',$3?$3:'',$lookahead,$line); + $prod and $prod->additem($item) + or _no_rule("pattern terminal",$line,"/$1/"); + } + elsif ($grammar =~ m/(?=$MTOKEN)/gco + and do { ($code, undef, @components) + = extract_quotelike($grammar); + $code } + ) + + { + _parse("an m/../ pattern terminal", $aftererror,$line,$code); + $item = new Parse::RecDescent::Token(@components[3,2,8], + $lookahead,$line); + $prod and $prod->additem($item) + or _no_rule("pattern terminal",$line,$code); + } + elsif ($grammar =~ m/(?=$MATCHRULE)/gco + and do { ($code) = extract_bracketed($grammar,'<'); + $code + } + or $grammar =~ m/$SUBRULE/gco + and $code = $1) + { + my $name = $code; + my $matchrule = 0; + if (substr($name,0,1) eq '<') + { + $name =~ s/$MATCHRULE\s*//; + $name =~ s/\s*>\Z//; + $matchrule = 1; + } + + # EXTRACT TRAILING ARG LIST (IF ANY) + + my ($argcode) = extract_codeblock($grammar, "[]",'') || ''; + + # EXTRACT TRAILING REPETITION SPECIFIER (IF ANY) + + if ($grammar =~ m/\G[(]/gc) + { + pos($grammar)--; + + if ($grammar =~ m/$OPTIONAL/gco) + { + _parse("an zero-or-one subrule match", $aftererror,$line,"$code$argcode($1)"); + $item = new Parse::RecDescent::Repetition($name,$1,0,1, + $lookahead,$line, + $self, + $matchrule, + $argcode); + $prod and $prod->additem($item) + or _no_rule("repetition",$line,"$code$argcode($1)"); + + !$matchrule and $rule and $rule->addcall($name); + } + elsif ($grammar =~ m/$ANY/gco) + { + _parse("a zero-or-more subrule match", $aftererror,$line,"$code$argcode($1)"); + if ($2) + { + my $pos = pos $grammar; + substr($grammar,$pos,0, + "(s?) "); + + pos $grammar = $pos; + } + else + { + $item = new Parse::RecDescent::Repetition($name,$1,0,$MAXREP, + $lookahead,$line, + $self, + $matchrule, + $argcode); + $prod and $prod->additem($item) + or _no_rule("repetition",$line,"$code$argcode($1)"); + + !$matchrule and $rule and $rule->addcall($name); + + _check_insatiable($name,$1,$grammar,$line) if $::RD_CHECK; + } + } + elsif ($grammar =~ m/$MANY/gco) + { + _parse("a one-or-more subrule match", $aftererror,$line,"$code$argcode($1)"); + if ($2) + { + # $DB::single=1; + my $pos = pos $grammar; + substr($grammar,$pos,0, + " "); + + pos $grammar = $pos; + } + else + { + $item = new Parse::RecDescent::Repetition($name,$1,1,$MAXREP, + $lookahead,$line, + $self, + $matchrule, + $argcode); + + $prod and $prod->additem($item) + or _no_rule("repetition",$line,"$code$argcode($1)"); + + !$matchrule and $rule and $rule->addcall($name); + + _check_insatiable($name,$1,$grammar,$line) if $::RD_CHECK; + } + } + elsif ($grammar =~ m/$EXACTLY/gco) + { + _parse("an exactly-$1-times subrule match", $aftererror,$line,"$code$argcode($1)"); + if ($2) + { + my $pos = pos $grammar; + substr($grammar,$pos,0, + "($1) "); + + pos $grammar = $pos; + } + else + { + $item = new Parse::RecDescent::Repetition($name,$1,$1,$1, + $lookahead,$line, + $self, + $matchrule, + $argcode); + $prod and $prod->additem($item) + or _no_rule("repetition",$line,"$code$argcode($1)"); + + !$matchrule and $rule and $rule->addcall($name); + } + } + elsif ($grammar =~ m/$BETWEEN/gco) + { + _parse("a $1-to-$2 subrule match", $aftererror,$line,"$code$argcode($1..$2)"); + if ($3) + { + my $pos = pos $grammar; + substr($grammar,$pos,0, + "($1..$2) "); + + pos $grammar = $pos; + } + else + { + $item = new Parse::RecDescent::Repetition($name,"$1..$2",$1,$2, + $lookahead,$line, + $self, + $matchrule, + $argcode); + $prod and $prod->additem($item) + or _no_rule("repetition",$line,"$code$argcode($1..$2)"); + + !$matchrule and $rule and $rule->addcall($name); + } + } + elsif ($grammar =~ m/$ATLEAST/gco) + { + _parse("a $1-or-more subrule match", $aftererror,$line,"$code$argcode($1..)"); + if ($2) + { + my $pos = pos $grammar; + substr($grammar,$pos,0, + "($1..) "); + + pos $grammar = $pos; + } + else + { + $item = new Parse::RecDescent::Repetition($name,"$1..",$1,$MAXREP, + $lookahead,$line, + $self, + $matchrule, + $argcode); + $prod and $prod->additem($item) + or _no_rule("repetition",$line,"$code$argcode($1..)"); + + !$matchrule and $rule and $rule->addcall($name); + _check_insatiable($name,"$1..",$grammar,$line) if $::RD_CHECK; + } + } + elsif ($grammar =~ m/$ATMOST/gco) + { + _parse("a one-to-$1 subrule match", $aftererror,$line,"$code$argcode(..$1)"); + if ($2) + { + my $pos = pos $grammar; + substr($grammar,$pos,0, + "(..$1) "); + + pos $grammar = $pos; + } + else + { + $item = new Parse::RecDescent::Repetition($name,"..$1",1,$1, + $lookahead,$line, + $self, + $matchrule, + $argcode); + $prod and $prod->additem($item) + or _no_rule("repetition",$line,"$code$argcode(..$1)"); + + !$matchrule and $rule and $rule->addcall($name); + } + } + elsif ($grammar =~ m/$BADREP/gco) + { + _parse("an subrule match with invalid repetition specifier", 0,$line); + _error("Incorrect specification of a repeated subrule", + $line); + _hint("Repeated subrules like \"$code$argcode$&\" cannot have + a maximum repetition of zero, nor can they have + negative components in their ranges."); + } + } + else + { + _parse("a subrule match", $aftererror,$line,$code); + my $desc; + if ($name=~/\A_alternation_\d+_of_production_\d+_of_rule/) + { $desc = $self->{"rules"}{$name}->expected } + $item = new Parse::RecDescent::Subrule($name, + $lookahead, + $line, + $desc, + $matchrule, + $argcode); + + $prod and $prod->additem($item) + or _no_rule("(sub)rule",$line,$name); + + !$matchrule and $rule and $rule->addcall($name); + } + } + elsif ($grammar =~ m/$LONECOLON/gco ) + { + _error("Unexpected colon encountered", $line); + _hint("Did you mean \"|\" (to start a new production)? + Or perhaps you forgot that the colon + in a rule definition must be + on the same line as the rule name?"); + } + elsif ($grammar =~ m/$ACTION/gco ) # BAD ACTION, ALREADY FAILED + { + _error("Malformed action encountered", + $line); + _hint("Did you forget the closing curly bracket + or is there a syntax error in the action?"); + } + elsif ($grammar =~ m/$OTHER/gco ) + { + _error("Untranslatable item encountered: \"$1\"", + $line); + _hint("Did you misspell \"$1\" + or forget to comment it out?"); + } + + if ($lookaheadspec =~ tr /././ > 3) + { + $lookaheadspec =~ s/\A\s+//; + $lookahead = $lookahead<0 + ? 'a negative lookahead ("...!")' + : 'a positive lookahead ("...")' ; + _warn(1,"Found two or more lookahead specifiers in a + row.",$line) + and + _hint("Multiple positive and/or negative lookaheads + are simply multiplied together to produce a + single positive or negative lookahead + specification. In this case the sequence + \"$lookaheadspec\" was reduced to $lookahead. + Was this your intention?"); + } + $lookahead = 0; + $lookaheadspec = ""; + + $grammar =~ m/\G\s+/gc; + } + + unless ($ERRORS or $isimplicit or !$::RD_CHECK) + { + $self->_check_grammar(); + } + + unless ($ERRORS or $isimplicit or $Parse::RecDescent::compiling) + { + my $code = $self->_code(); + if (defined $::RD_TRACE) + { + print STDERR "printing code (", length($code),") to RD_TRACE\n"; + local *TRACE_FILE; + open TRACE_FILE, ">RD_TRACE" + and print TRACE_FILE "my \$ERRORS;\n$code" + and close TRACE_FILE; + } + + unless ( eval "$code 1" ) + { + _error("Internal error in generated parser code!"); + $@ =~ s/at grammar/in grammar at/; + _hint($@); + } + } + + if ($ERRORS and !_verbosity("HINT")) + { + local $::RD_HINT = 1; + _hint('Set $::RD_HINT (or -RD_HINT if you\'re using "perl -s") + for hints on fixing these problems.'); + } + if ($ERRORS) { $ERRORS=0; return } + return $self; +} + + +sub _addstartcode($$) +{ + my ($self, $code) = @_; + $code =~ s/\A\s*\{(.*)\}\Z/$1/s; + + $self->{"startcode"} .= "$code;\n"; +} + +# CHECK FOR GRAMMAR PROBLEMS.... + +sub _check_insatiable($$$$) +{ + my ($subrule,$repspec,$grammar,$line) = @_; + pos($grammar)=pos($_[2]); + return if $grammar =~ m/$OPTIONAL/gco || $grammar =~ m/$ANY/gco; + my $min = 1; + if ( $grammar =~ m/$MANY/gco + || $grammar =~ m/$EXACTLY/gco + || $grammar =~ m/$ATMOST/gco + || $grammar =~ m/$BETWEEN/gco && do { $min=$2; 1 } + || $grammar =~ m/$ATLEAST/gco && do { $min=$2; 1 } + || $grammar =~ m/$SUBRULE(?!\s*:)/gco + ) + { + return unless $1 eq $subrule && $min > 0; + _warn(3,"Subrule sequence \"$subrule($repspec) $&\" will + (almost certainly) fail.",$line) + and + _hint("Unless subrule \"$subrule\" performs some cunning + lookahead, the repetition \"$subrule($repspec)\" will + insatiably consume as many matches of \"$subrule\" as it + can, leaving none to match the \"$&\" that follows."); + } +} + +sub _check_grammar ($) +{ + my $self = shift; + my $rules = $self->{"rules"}; + my $rule; + foreach $rule ( values %$rules ) + { + next if ! $rule->{"changed"}; + + # CHECK FOR UNDEFINED RULES + + my $call; + foreach $call ( @{$rule->{"calls"}} ) + { + if (!defined ${$rules}{$call} + &&!defined &{"Parse::RecDescent::$call"}) + { + if (!defined $::RD_AUTOSTUB) + { + _warn(3,"Undefined (sub)rule \"$call\" + used in a production.") + and + _hint("Will you be providing this rule + later, or did you perhaps + misspell \"$call\"? Otherwise + it will be treated as an + immediate ."); + eval "sub $self->{namespace}::$call {undef}"; + } + else # EXPERIMENTAL + { + my $rule = $::RD_AUTOSTUB || qq{'$call'}; + _warn(1,"Autogenerating rule: $call") + and + _hint("A call was made to a subrule + named \"$call\", but no such + rule was specified. However, + since \$::RD_AUTOSTUB + was defined, a rule stub + ($call : $rule) was + automatically created."); + + $self->_generate("$call : $rule",0,1); + } + } + } + + # CHECK FOR LEFT RECURSION + + if ($rule->isleftrec($rules)) + { + _error("Rule \"$rule->{name}\" is left-recursive."); + _hint("Redesign the grammar so it's not left-recursive. + That will probably mean you need to re-implement + repetitions using the '(s)' notation. + For example: \"$rule->{name}(s)\"."); + next; + } + } +} + +# GENERATE ACTUAL PARSER CODE + +sub _code($) +{ + my $self = shift; + my $code = qq{ +package $self->{namespace}; +use strict; +use vars qw(\$skip \$AUTOLOAD $self->{localvars} ); +\$skip = '$skip'; +$self->{startcode} + +{ +local \$SIG{__WARN__} = sub {0}; +# PRETEND TO BE IN Parse::RecDescent NAMESPACE +*$self->{namespace}::AUTOLOAD = sub +{ + no strict 'refs'; + \$AUTOLOAD =~ s/^$self->{namespace}/Parse::RecDescent/; + goto &{\$AUTOLOAD}; +} +} + +}; + $code .= "push \@$self->{namespace}\::ISA, 'Parse::RecDescent';"; + $self->{"startcode"} = ''; + + my $rule; + foreach $rule ( values %{$self->{"rules"}} ) + { + if ($rule->{"changed"}) + { + $code .= $rule->code($self->{"namespace"},$self); + $rule->{"changed"} = 0; + } + } + + return $code; +} + + +# EXECUTING A PARSE.... + +sub AUTOLOAD # ($parser, $text; $linenum, @args) +{ + croak "Could not find method: $AUTOLOAD\n" unless ref $_[0]; + my $class = ref($_[0]) || $_[0]; + my $text = ref($_[1]) ? ${$_[1]} : $_[1]; + $_[0]->{lastlinenum} = $_[2]||_linecount($_[1]); + $_[0]->{lastlinenum} = _linecount($_[1]); + $_[0]->{lastlinenum} += $_[2] if @_ > 2; + $_[0]->{offsetlinenum} = $_[0]->{lastlinenum}; + $_[0]->{fulltext} = $text; + $_[0]->{fulltextlen} = length $text; + $_[0]->{deferred} = []; + $_[0]->{errors} = []; + my @args = @_[3..$#_]; + my $args = sub { [ @args ] }; + + $AUTOLOAD =~ s/$class/$_[0]->{namespace}/; + no strict "refs"; + + croak "Unknown starting rule ($AUTOLOAD) called\n" + unless defined &$AUTOLOAD; + my $retval = &{$AUTOLOAD}($_[0],$text,undef,undef,$args); + + if (defined $retval) + { + foreach ( @{$_[0]->{deferred}} ) { &$_; } + } + else + { + foreach ( @{$_[0]->{errors}} ) { _error(@$_); } + } + + if (ref $_[1]) { ${$_[1]} = $text } + + $ERRORS = 0; + return $retval; +} + +sub _parserepeat($$$$$$$$$$) # RETURNS A REF TO AN ARRAY OF MATCHES +{ + my ($parser, $text, $prod, $min, $max, $_noactions, $expectation, $argcode) = @_; + my @tokens = (); + + my $reps; + for ($reps=0; $reps<$max;) + { + $_[6]->at($text); # $_[6] IS $expectation FROM CALLER + my $_savetext = $text; + my $prevtextlen = length $text; + my $_tok; + if (! defined ($_tok = &$prod($parser,$text,1,$_noactions,$argcode))) + { + $text = $_savetext; + last; + } + push @tokens, $_tok if defined $_tok; + last if ++$reps >= $min and $prevtextlen == length $text; + } + + do { $_[6]->failed(); return undef} if $reps<$min; + + $_[1] = $text; + return [@tokens]; +} + + +# ERROR REPORTING.... + +my $errortext; +my $errorprefix; + +open (ERROR, ">&STDERR"); +format ERROR = +@>>>>>>>>>>>>>>>>>>>>: ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< +$errorprefix, $errortext +~~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< + $errortext +. + +select ERROR; +$| = 1; + +# TRACING + +my $tracemsg; +my $tracecontext; +my $tracerulename; +use vars '$tracelevel'; + +open (TRACE, ">&STDERR"); +format TRACE = +@>|@|||||||||@^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<| +$tracelevel, $tracerulename, '|', $tracemsg + | ~~ |^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<| + $tracemsg +. + +select TRACE; +$| = 1; + +open (TRACECONTEXT, ">&STDERR"); +format TRACECONTEXT = +@>|@|||||||||@ |^<<<<<<<<<<<<<<<<<<<<<<<<<<< +$tracelevel, $tracerulename, '|', $tracecontext + | ~~ | |^<<<<<<<<<<<<<<<<<<<<<<<<<<< + $tracecontext +. + + +select TRACECONTEXT; +$| = 1; + +select STDOUT; + +sub _verbosity($) +{ + defined $::RD_TRACE + or defined $::RD_HINT and $_[0] =~ /ERRORS|WARN|HINT/ + or defined $::RD_WARN and $_[0] =~ /ERRORS|WARN/ + or defined $::RD_ERRORS and $_[0] =~ /ERRORS/ +} + +sub _error($;$) +{ + $ERRORS++; + return 0 if ! _verbosity("ERRORS"); + $errortext = $_[0]; + $errorprefix = "ERROR" . ($_[1] ? " (line $_[1])" : ""); + $errortext =~ s/\s+/ /g; + print ERROR "\n" if _verbosity("WARN"); + write ERROR; + return 1; +} + +sub _warn($$;$) +{ + return 0 unless _verbosity("WARN") && ($::RD_HINT || $_[0] >= ($::RD_WARN||1)); + $errortext = $_[1]; + $errorprefix = "Warning" . ($_[2] ? " (line $_[2])" : ""); + print ERROR "\n"; + $errortext =~ s/\s+/ /g; + write ERROR; + return 1; +} + +sub _hint($) +{ + return 0 unless defined $::RD_HINT; + $errortext = "$_[0])"; + $errorprefix = "(Hint"; + $errortext =~ s/\s+/ /g; + write ERROR; + return 1; +} + +sub _tracemax($) +{ + if (defined $::RD_TRACE + && $::RD_TRACE =~ /\d+/ + && $::RD_TRACE>1 + && $::RD_TRACE+10..." + . substr($_[0],-$::RD_TRACE/2); + } + else + { + return $_[0]; + } +} + +sub _tracefirst($) +{ + if (defined $::RD_TRACE + && $::RD_TRACE =~ /\d+/ + && $::RD_TRACE>1 + && $::RD_TRACE+10"; + } + else + { + return $_[0]; + } +} + +my $lastcontext = ''; +my $lastrulename = ''; +my $lastlevel = ''; + +sub _trace($;$$$) +{ + $tracemsg = $_[0]; + $tracecontext = $_[1]||$lastcontext; + $tracerulename = $_[2]||$lastrulename; + $tracelevel = $_[3]||$lastlevel; + if ($tracerulename) { $lastrulename = $tracerulename } + if ($tracelevel) { $lastlevel = $tracelevel } + + $tracecontext =~ s/\n/\\n/g; + $tracecontext =~ s/\s+/ /g; + $tracerulename = qq{$tracerulename}; + write TRACE; + if ($tracecontext ne $lastcontext) + { + if ($tracecontext) + { + $lastcontext = _tracefirst($tracecontext); + $tracecontext = qq{"$tracecontext"}; + } + else + { + $tracecontext = qq{}; + } + write TRACECONTEXT; + } +} + +sub _parseunneg($$$$) +{ + _parse($_[0],$_[1],$_[3]); + if ($_[2]<0) + { + _error("Can't negate \"$&\".",$_[3]); + _hint("You can't negate $_[0]. Remove the \"...!\" before + \"$&\"."); + return 0; + } + return 1; +} + +sub _parse($$$;$) +{ + my $what = $_[3] || $&; + $what =~ s/^\s+//; + if ($_[1]) + { + _warn(3,"Found $_[0] ($what) after an unconditional ",$_[2]) + and + _hint("An unconditional always causes the + production containing it to immediately fail. + \u$_[0] that follows an + will never be reached. Did you mean to use + instead?"); + } + + return if ! _verbosity("TRACE"); + $errortext = "Treating \"$what\" as $_[0]"; + $errorprefix = "Parse::RecDescent"; + $errortext =~ s/\s+/ /g; + write ERROR; +} + +sub _linecount($) { + scalar substr($_[0], pos $_[0]||0) =~ tr/\n// +} + + +package main; + +use vars qw ( $RD_ERRORS $RD_WARN $RD_HINT $RD_TRACE $RD_CHECK ); +$::RD_CHECK = 1; +$::RD_ERRORS = 1; +$::RD_WARN = 3; + +1; + diff --git a/perl/Template.pm b/perl/Template.pm new file mode 100644 index 000000000..76c84c735 --- /dev/null +++ b/perl/Template.pm @@ -0,0 +1,916 @@ +#============================================================= -*-perl-*- +# +# Template +# +# DESCRIPTION +# Module implementing a simple, user-oriented front-end to the Template +# Toolkit. +# +# AUTHOR +# Andy Wardley +# +# COPYRIGHT +# Copyright (C) 1996-2009 Andy Wardley. All Rights Reserved. +# +# This module is free software; you can redistribute it and/or +# modify it under the same terms as Perl itself. +# +#======================================================================== + +package Template; + +use strict; +use warnings; +use 5.006; +use base 'Template::Base'; + +use Template::Config; +use Template::Constants; +use Template::Provider; +use Template::Service; +use File::Basename; +use File::Path; +use Scalar::Util qw(blessed); + +our $VERSION = '2.22'; +our $ERROR = ''; +our $DEBUG = 0; +our $BINMODE = 0 unless defined $BINMODE; +our $AUTOLOAD; + +# preload all modules if we're running under mod_perl +Template::Config->preload() if $ENV{ MOD_PERL }; + + +#------------------------------------------------------------------------ +# process($input, \%replace, $output) +# +# Main entry point for the Template Toolkit. The Template module +# delegates most of the processing effort to the underlying SERVICE +# object, an instance of the Template::Service class. +#------------------------------------------------------------------------ + +sub process { + my ($self, $template, $vars, $outstream, @opts) = @_; + my ($output, $error); + my $options = (@opts == 1) && ref($opts[0]) eq 'HASH' + ? shift(@opts) : { @opts }; + + $options->{ binmode } = $BINMODE + unless defined $options->{ binmode }; + + # we're using this for testing in t/output.t and t/filter.t so + # don't remove it if you don't want tests to fail... + $self->DEBUG("set binmode\n") if $DEBUG && $options->{ binmode }; + + $output = $self->{ SERVICE }->process($template, $vars); + + if (defined $output) { + $outstream ||= $self->{ OUTPUT }; + unless (ref $outstream) { + my $outpath = $self->{ OUTPUT_PATH }; + $outstream = "$outpath/$outstream" if $outpath; + } + + # send processed template to output stream, checking for error + return ($self->error($error)) + if ($error = &_output($outstream, \$output, $options)); + + return 1; + } + else { + return $self->error($self->{ SERVICE }->error); + } +} + + +#------------------------------------------------------------------------ +# service() +# +# Returns a reference to the the internal SERVICE object which handles +# all requests for this Template object +#------------------------------------------------------------------------ + +sub service { + my $self = shift; + return $self->{ SERVICE }; +} + + +#------------------------------------------------------------------------ +# context() +# +# Returns a reference to the the CONTEXT object withint the SERVICE +# object. +#------------------------------------------------------------------------ + +sub context { + my $self = shift; + return $self->{ SERVICE }->{ CONTEXT }; +} + + +#======================================================================== +# -- PRIVATE METHODS -- +#======================================================================== + +#------------------------------------------------------------------------ +# _init(\%config) +#------------------------------------------------------------------------ +sub _init { + my ($self, $config) = @_; + + # convert any textual DEBUG args to numerical form + my $debug = $config->{ DEBUG }; + $config->{ DEBUG } = Template::Constants::debug_flags($self, $debug) + || return if defined $debug && $debug !~ /^\d+$/; + + # prepare a namespace handler for any CONSTANTS definition + if (my $constants = $config->{ CONSTANTS }) { + my $ns = $config->{ NAMESPACE } ||= { }; + my $cns = $config->{ CONSTANTS_NAMESPACE } || 'constants'; + $constants = Template::Config->constants($constants) + || return $self->error(Template::Config->error); + $ns->{ $cns } = $constants; + } + + $self->{ SERVICE } = $config->{ SERVICE } + || Template::Config->service($config) + || return $self->error(Template::Config->error); + + $self->{ OUTPUT } = $config->{ OUTPUT } || \*STDOUT; + $self->{ OUTPUT_PATH } = $config->{ OUTPUT_PATH }; + + return $self; +} + + +#------------------------------------------------------------------------ +# _output($where, $text) +#------------------------------------------------------------------------ + +sub _output { + my ($where, $textref, $options) = @_; + my $reftype; + my $error = 0; + + # call a CODE reference + if (($reftype = ref($where)) eq 'CODE') { + &$where($$textref); + } + # print to a glob (such as \*STDOUT) + elsif ($reftype eq 'GLOB') { + print $where $$textref; + } + # append output to a SCALAR ref + elsif ($reftype eq 'SCALAR') { + $$where .= $$textref; + } + # push onto ARRAY ref + elsif ($reftype eq 'ARRAY') { + push @$where, $$textref; + } + # call the print() method on an object that implements the method + # (e.g. IO::Handle, Apache::Request, etc) + elsif (blessed($where) && $where->can('print')) { + $where->print($$textref); + } + # a simple string is taken as a filename + elsif (! $reftype) { + local *FP; + # make destination directory if it doesn't exist + my $dir = dirname($where); + eval { mkpath($dir) unless -d $dir; }; + if ($@) { + # strip file name and line number from error raised by die() + ($error = $@) =~ s/ at \S+ line \d+\n?$//; + } + elsif (open(FP, ">$where")) { + # binmode option can be 1 or a specific layer, e.g. :utf8 + my $bm = $options->{ binmode }; + if ($bm && $bm eq 1) { + binmode FP; + } + elsif ($bm){ + binmode FP, $bm; + } + print FP $$textref; + close FP; + } + else { + $error = "$where: $!"; + } + } + # give up, we've done our best + else { + $error = "output_handler() cannot determine target type ($where)\n"; + } + + return $error; +} + + +1; + +__END__ + +=head1 NAME + +Template - Front-end module to the Template Toolkit + +=head1 SYNOPSIS + + use Template; + + # some useful options (see below for full list) + my $config = { + INCLUDE_PATH => '/search/path', # or list ref + INTERPOLATE => 1, # expand "$var" in plain text + POST_CHOMP => 1, # cleanup whitespace + PRE_PROCESS => 'header', # prefix each template + EVAL_PERL => 1, # evaluate Perl code blocks + }; + + # create Template object + my $template = Template->new($config); + + # define template variables for replacement + my $vars = { + var1 => $value, + var2 => \%hash, + var3 => \@list, + var4 => \&code, + var5 => $object, + }; + + # specify input filename, or file handle, text reference, etc. + my $input = 'myfile.html'; + + # process input template, substituting variables + $template->process($input, $vars) + || die $template->error(); + +=head1 DESCRIPTION + +This documentation describes the Template module which is the direct +Perl interface into the Template Toolkit. It covers the use of the +module and gives a brief summary of configuration options and template +directives. Please see L for the complete reference +manual which goes into much greater depth about the features and use +of the Template Toolkit. The L is also available +as an introductory guide to using the Template Toolkit. + +=head1 METHODS + +=head2 new(\%config) + +The C constructor method (implemented by the +L base class) instantiates a new +C