- 
                Notifications
    You must be signed in to change notification settings 
- Fork 928
Meeting 2011 05
        Tomislav Janjusic edited this page Jan 6, 2024 
        ·
        1 revision
      
    - Dates: May 3-5, 2011 (Tuesday - Thursday)
- Time: 9 am - 5 pm Eastern each day
- Location: ORNL, Oak Ridge, TN
Audio, video, and screen conferencing is available via WebEx -- see below for details.
- The meeting will be held at ORNL.
- Oak Ridge Hotels
- There are other hotels in the Turkey Creek/West Knoxville area that are also relatively close to the lab. (Google Map)
Below are the agenda items that I have gathered so far (in no particular order):
- MPI 2.2 implementation tickets
- MPI 3.0 implementation planning
- ORNL: Hierarchical Collectives discussion
- Runtime integration discussion
- New Process Affinity functionality
- New user interface
- Possibly implementation details
 
- Update on ORTE development
- State machine overview and discussion
 
- Fault tolerance feature development and integration (C/R, logging, replication, FT-MPI, MPI 3.0, message reliability, ...)
- Survey what is: (working, not working, not supported)
- Available in the trunk
- Available in a current release
- In development
 
- Update on the sensor framework
- Create a FAQ entry for each supported FT technique, and how to use it
- Is there anything we can share regarding code in development
 
- Survey what is: (working, not working, not supported)
- Threading design discussion
- Threading in the point-to-point stack (PML/BML/BTL/MPool)
- Other threading problem areas (e.g., Attribute locking issue that emerged with ROMIO)
- What level of MPI_THREAD_MULTIPLE do we want to (can we practically) support?
- NUMA safety issues (e.g., write/read barriers)
- George's work on threading in the TCP BTL
 
- Testing infrastructure (MTT)
- Discuss current setup, and state of testing
- Review individual MTT setups
- Discuss automated testing of command line/MCA options (coverage tracking)
 
- MPI Extensions Update
- ORTE Shared Memory Update
Additional topics that folks might want to discuss - Is there anyone that wants to include these and lead their discussions?:
- Performance tuning (Point-to-point and/or Collective)
- Brian Barrett (Sandia)
- Wesley Bland (UTK)
- Shiqing Fan (HLRS)
- Rich Graham (ORNL)
- Samuel Gutierrez (LANL)
- Nathan Hjelm (LANL)
- Josh Hursey (ORNL)
- Yevgeny Kliteynik (Mellanox)
- Pavel (Pasha) Shamis (ORNL)
- Jeff Squyres (Cisco)
- Bin Wang (Auburn University)
Below is a sketch of the agenda. It will be finalized the week before the meeting, but is always subject to change.
- 
Tuesday, May 3 (Building 5100 Room 262)
- Note: We will not be having the Open MPI Teleconference.
- Webex password: ompi, (stale link)
 
| Start | End | Leader | Topic | 
|---|---|---|---|
| 9:00 am | 9:30 am | All | Logistics, etc | 
| 9:30 am | Noon | All | MPI 2.2 and 3.0 Implementation Tickets and Planning | 
| Noon | 1:00 pm | -- Lunch -- | |
| 1:00 pm | 2:00 pm | ORNL | Hierarchical Collectives discussion | 
| 2:00 pm | 3:00 pm | Oracle/Cisco/ORNL | New Process Affinity functionality | 
| 3:00 pm | 3:30 pm | -- Break -- | |
| 3:30 pm | 5:00 pm | ALL | Testing Infrastructure (MTT) | 
- 
Wednesday, May 4 (Building 5100 Room 262)
- Webex password: ompi, (stale link)
 
| Start | End | Leader | Topic | 
|---|---|---|---|
| 9:00 am | Noon | IBM/All | Threading Design | 
| Noon | 1:00 pm | -- Lunch -- | |
| 1:00 pm | 2:00 pm | NVIDIA | CUDA Support | 
| 2:00 pm | 3:00 pm | Cisco | ORTE Development Update | 
| 3:00 pm | 3:30 pm | -- Break -- | |
| 3:30 pm | 5:00 pm | ORNL | Runtime integration discussion | 
- 
Thursday, May 5 (Building 5700 Room L202)
- Webex password: ompi, (stale link)
 
| Start | End | Leader | Topic | 
|---|---|---|---|
| 9:00 am | Noon | IBM/All | Threading Design (as needed) | 
| 10:30 am | 11:00 am | ORNL/Cisco | MPI Extensions Update | 
| Noon | 1:00 pm | -- Lunch -- | |
| 1:00 pm | 2:00 pm | LANL/Cisco/UTK | ORTE Shared Memory Update | 
| 2:00 pm | 3:00 pm | All | Wrap up and next steps discussion | 
| 3:00 pm | 3:30 pm | -- Break -- | |
| 3:30 pm | 5:00 pm | ORNL | Fault tolerance feature development and integration | 
Tuesday:
- 
MPI 2.2 Report: {18}
- 
Need to create a custom report for both 2.2 and 3.0 pending tickets for quick reference.(Done - Josh)
- trac ticket 1368: Projected to be less than a day of work. A summer student at ORNL might be available.
- 
trac ticket 2221: Algorithm addition for MPI_IN_PLACEinMPI_Exscan. Rolf/Nvidia might have some cycles to look into this.
- trac ticket 2223: Jeff/Cisco has it half integrated in a bitbucket branch. 1-2 weeks of work left for someone to finish it off.
- trac ticket 2699: George/UTK thinks it is ready. Jeff to follow-up.
 
- 
- 
MPI 3.0  {17}
- trac ticket 2715: Non-blocking collectives work ongoing at ORNL.
- trac ticket 2716: Brian/Sandia to take a look at re-integration of the tmp branch. For 'ob1' it is just a bit of cleanup. But for 'cm' it may not be able to be implemented until the hardware catches up, so err_not_implemented will likely be returned.
- Long term items that could be voted on this year.
- New RMA: Needs progress thread support... Brian/Sandia to lead this effort
- Tools: Jeff/Cisco to look into who might be interested in implementing (talk to HLRS)
- Fault Tolerance: ORNL is leading the working group and the Open MPI prototype. More on this on Thursday.
- Fortran: Craig/LANL is working on a prototype
- MPI_Count: ??
- Timers: Jeff/Cisco to look into this if it passes.
- MPI const: Jeff/Cisco to look into this if it passes.
 
 
- 
Hierarchical Collectives
- Lots of good stuff presented from a couple of accepted and out-going papers. I'll leave the rest of the notes for those papers.
 
- 
Affinity
- Intention is to create a single, flexible mapper then build the common abstraction on top of it. This eases the maintainability burden on ompi developers.
- The proposed algorithm is a bit complex.
- Slides presented will be made available on demand.
- Josh/ORNL, Jeff/Cisco and Terry/Oracle are working on writing this up at the moment.
- Items to look at:
- Ordering as a separate step.
- Stride is useful in multithreaded application that does not allocate threads with respect to specific cache levels or hardware boundaries.
 
 
- 
MTT Testing
- Introduced a few new members to how to setup and run MTT
- Not much outside of that was discussed regarding testing.
 
Wednesday:
- TBD
Thursday:
- TBD