IBM Z and LinuxONE - Languages - Group home

On Games(SG14) and TM(SG5) from The View at the May 2015 C++ Standard meeting in Lenexa

  

The yellow brick road starts here in Kansas (actually Lenexa) hosted by Perceptive Software, and it is called C++17. At this meeting, it lead with a major evening discussion on the philosophy of C++17 on a Monday night full session. This was motivated by an email from Bjarne just before the meeting. Here is some aspect of that email which I had accidentally posted in slide form to an external Games development discussion site while preparing for a keynote I was giving in Parallel 2015 and ADC++ 2015 before the meeting and it was picked up by many people.

https://groups.google.com/forum/#!topic/unofficial-real-time-cxx/j8gDKf4SzKM

This forum was the precursor to a discussion on improving C++ for Games development where I had been carrying a conversation with many games developer since October of 2014. That has now evolved into an official Study Group for C++ in the form SG14: Games Dev/Low Latency. It now has its own mailing list which I invite Games developers and others interested in Low Latency (such as graphics, real time requirements) to join. More on this later.

 

Much has happened since the last set of C++ Standard meetings encapsulated in this triple trip report for Urbana, Skillman, and Cologne. The major outcome of this meeting are 5 Technical Specifications that were published or potentially publishable following this meeting. These are all now all possible C++17 candidates. They are (with links to the latest versions)

All have been approved to publish or with provisions for a final pass. This is in addition to File Systems which was approved previously. Concepts is also considered having completed even though its due date was during the meeting, as we had early review of comments from four National Bodies (Canada, US, Finland, and UK) and there was substantial review in Lenexa where CWG met with EWG to consider design level comments on Concepts from these early comments. The only one of interest is that US3 on constrained auto was accepted and implemented immediately in the updated draft and all others were rejected for this TS and placed for future consideration. There will be a Concepts telecom on Monday June 29 after the ballot formally closes on May 13. And now that it has, there has not been any additional different or new comment other then one that was already expressed by the early NB submitters. At the Concept telecon, there will be a decision to approve responses to NB comments and either publish or defer final review to Kona. Its ultimate publication date will depend on a final decision and if there are no major issues. All these TSs will be considered in some form for C++17.


Other Committees are starting up too. They are SG6 Numerics which is getting really rolling now with a number of proposals. SG7 reflection also are reflecting on design decisions with many papers. SG9 was extremely active with new proposal for range design by Eric Niebler. SG10 Feature test helped guided SG1’s Parallelism and Concurrency proposal as well SG5’s Transactional Memory proposal. Every TS will have a proposal for the macro and why its not needed although it is not normative and is not part of the IS. Sg12’s Undefined Behavior did not meet. SG13 2D graphics had an I/O paper that was handled in the core subgroup. It is not complete but is 80% done.


SG1 officially voted out the Parallelism TS1 and is preparing for TS2. It was decided that for future TS, at least for Parallelism will be cumulative on top of current TS to ensure that changes are always in the latest TS rather than having to piece together multiple TSs as a union. This means newer TSs could remove or change previous TS. We will likely adapt this cumulative approach for Transactional Memory. It is not clear whether this approach will work for Concurrency. Future Parallelism TS2 will consider Task region, executor algorithm, vector/SIMD, fork-join parallelism similar to Cilk and OpenMP as well as better progress guarantee wordings.

Concurrency TS was also voted to publish and it contains better futures, latches, barriers, and atomic smart pointer, but removed atomic atomic_unique_ptr. Concurrency TS2 candidates are Executors, Variadic lock guard, Atomic views and array views, possibly resumable functions in the form of stackless or stackful co-routines/fibers, counters, queues, concurrent vectors, unordered associative containers, and upgradable locks.

However, co-routines have moved firmly to Evolution because it is not just a concurrency issue but an overall paradigm of programming that needs to be considered for C++ as a whole.


Evolution WG gave Design guidance for Transactional Memory National Body comments and Concepts National Body comments. Both progressed afterwards through to Core and Library to complete final changes. They also gave directional guidance on the Array TS. At this point the proposal for an Array of Runtime bound seems to be still dead. Of the several reseumable functions, Gor Nishanov’s stackless co-routine language proposal named resumable functions was approved to move forward because it already had significant reviews before all these alternatives were proposed. Chris Kohloff’s stackless co-routine language proposal based on lambda expression named resumable expression had no consensus. The other co-routine proposal moving forward is a library proposal by Nat Goodspeed and that is also progressing through Library Working Group.


Other interesting proposals moving forward are default compare, and dot operator, something many people have been asking about. Discussions have also started on Reflection, Modules, and Contracts as there were multiple proposals and it seems there will be need for unifying the multiple approaches. Ranges is another major theme starting as a possible STL replacement/addition, aka STLv2. Eric Niebler's latest updated draft adds TS-like wording to his previous proposal that was exposed to the committee in the last Standard meeting is now augmented with Concepts as was requested. This is now working its way through LEWG and soon LWG. I think these all have a potential for entry into C++17 in some form, but more likely would start life as a TS because of their size.


Library WG as usual had more polls then any other working group. They processed 26 Defect Reports but had 85 new issues. The only interestingly controversial issue was the proposal to make noexcept specifier part of the type. This has a potential ABI impact as well as mangling changes. There was general concern that this needs wider consideration for its library and backwards compatibility breakage. It was deferred to the next Standard meeting in Kona as there was no immediate urgency.


Library Evolution WG considered 30 papers and has started Library Fundamental 2 TS. They used a workshop format to get its work done by working within the same room in small groups in order to get more parallelism to work through the papers. The most interesting paper reviewed is a design change proposed for ranges by Eric Niebler. This is a potential new form that can replace or augment iterators. This was done and the author was given encouragement and there will be a telecon on the Range design before it is passed to Library WG.

At the end, there was also one vote from LWG that discussed the idea of bringing the Special Math IS and make it part of C++17, conditionally supported. Based on N4437, it proposes to merge International Standard 29124:2010, “Extensions to the C++Library to support mathematical special functions,” into C++17 as a conditionally-supported standard library feature. There were a large representation of Scientific Labs at this meeting and this would be the best chance to get it to be voted into C++17. The end result however, lacked consensus to move it in so it stays as is.


I was busy in this meeting as usual with the last 3 years of meetings guiding the Transactional Memory TS through to publication. Now that it is done, I can take some time to start some other project, although TM will likely continue in a second version with additional features based on industry experience from Wyatt Technologies and other areas, as well as implementation.

 

One of my passion when I used to have time was computer games, especially the use of increasingly accelerated graphics. I no longer have as much time for games, but I still maintain a keen interest in watching for new releases and observe how hard they push the hardware and software. It so happens that Games and C++ is a match as many game engines are programmed wither in C++, or a combination with other language. 

In 2007, Paul Pedriana who was at the time a maintainer for Electronic Art's STL library published a paper to the C++ Standard which went almost unnoticed.


N2271 “EASTL -- Electronic Arts Standard Template Library


The reason it went unnoticed was because the Standard committee at the time was busy with C++11 revisions but that is not a really good excuse. Sometimes, papers do fall through, although with today's triage system, it is less likely. But I remember and read the paper at the time because of my interest in Games, wand wanted to do something about it when the dust settled.

That time came in 2014 at CPPCon, when a group of Games developers gathered to present several talks on how C++ is used in Games.


In subsequent conversation with Paul Pedriana and others, it seems C++ is used almost universally in games on platforms which freely support C++. The vast majority of serious games written for desktop platforms (Windows, Mac OS, Linux), console platforms (PS4, XBox, Wii), and hand-held platforms (3DS, Shield) are in C++. Even C is almost unheard of in game development. Nearly all game engines are written in C++, though some of them expose a non-C++ scripting interface to the engine user (e.g. Unity is written in C++ but exposes a .Net interface). Game engines written in C++ include all the major game engines, such as Unreal Engine, Unity, Frostbite, Source Engine, Havok, Ogre, etc. On Android C++ is available only in native mode development, which is somewhat tedious to use. As a result, only highly performance-sensitive games tend to be developed in C++ on Android. iOS supports both Objective C and C++, but any application is effectively forced into using at least some Objective C in order to interface with the OS. iOS games tend to be written in Objective C or a mix of Objective C and C++.

 

When the issue of how to improve C++ for Games came up, I brought up N2271 and the games developers rallied around me to host an impromptu BoF on this where we discussed various techniques for overcoming the challenges C++ presents to Games. Some of those include:

  • Control: game developers need knobs and sliders 
  • Reliability: Code must do what it's told independent of platform or implementation
  • Metrics: Hard milestone dates leave no room for surprises
  • Performance: Buttery smooth Hollywood-quality graphics on aging, commodity hardware
  • Programmer Iteration: REPL-like testing of gameplay changes and features
  • Standard Support: Third party code interaction with common game industry requirements and platforms


After CPPCon, Sean Middleditch of Wargaming took on the mantle of gathering some of the discussion and started a Google group to continue the discussion, to which I joined while I was in my crazy travel period around the world twice. This group continued discussion from Oct 2014 to May 2015, during which time I helped to gather what was discussed into a paper N4456  for a proposal to the Lenexa C++ Standard meeting to start an SG. 


I am only leading this because one of the worst constraint of all is the tightly driven crazy schedule of game developers which leaves little room to attend Standard meetings whereas one of my many job is to do that and represent Canada, where there happens to be many game development houses and subsidiary. 


The presentation to Evolution WG was a success as there was resounding interest in starting an SG. Thus SG14 Games Dev/Low Latency was formed. Given that Games developers cannot freely attend C++ Standard meetings, it was decided that we will go to where they are. SO we decided to host an official meeting on Wednesday of CPPCon 2015, as well as one of the day at GDC2016 in San Francisco. Other venues such as E3 or CES are also possible if that is where games developers might gather. A Sony representation was present, and he commented that Sony would be happy to fund the GDC2016 gathering. 


The end result is that we now have our own official ISOCPP reflector for SG14 where discussions can continue. 

The name of this subgroup has been under debate because there is a cross intersection of interests.

From the start, we felt the name should involve both Gaming, Graphics and Finance interest. But there is no easy way to encapsulate all that in the name. This is because the general theme as you can see form the above requirement lists is low-latency, and (soft) real-time capability in C++ as required by Games, Financial applications, Flight Simulators, 3D Graphics and Virtual reality applications. In fact, I am open to changing the name if it means a more direct connection, but Games have been our first direct stake holders and low latency describes the specific requirement within Games extremely well. However since its inception, we had support from many segments from within the Financial industry, the Flight Simulator industry, as well as others.

In fact, I would say nothing I have done in the last 15 years in the various Standard, not even Transactional Memory, have gathered so much direct queries from people who are interested in joining and helping this effort. If you are interested, lease go to the SG14 link or send me an email. 


I will say more on this development in future, but I do want to devote some time in this blog to discuss what the Transactional Memory TS does. The constructs are actually quite simple as shown in this single slide from one of my recent keynote at CERN:

 

1 construct for transactions

1.Compound Statements

2 Keywords for different types of TX

atomic_noexcept | atomic_commit | atomic_cancel {<compound-statement> }

synchronized {<compound-statement> }

1 Function/function pointer keyword

transaction_safe

transaction_safe dynamic

-must be a keyword because it conveys necessary semantics on type

1 Function/function pointer attribute

[[transaction_unsafe]]

-provides static checking and performance hints, so it can be an attribute

[[optimized_for_synchronized]]

-provides a speculative version for synchronized blocks for the common case when no unsafe functions are called

 

We introduce two kinds of blocks to exploit transactional memory: synchronized blocks and atomic blocks.

Synchronized blocks behave as if all synchronized blocks were protected by a single global recursive mutex.

Atomic blocks (also called atomic transactions, or just transactions) appear to execute atomically and not

concurrently with any synchronized block (unless the atomic block is executed within the synchronized

block). Some operations are prohibited within atomic blocks because it may be impossible, difficult, or

expensive to support executing them in atomic blocks; such operations are called transaction-unsafe. An

atomic block also specifies how to handle an exception thrown but not caught within the atomic block.

Some noteworthy points about synchronized and atomic blocks:


Data races Operations executed within synchronized or atomic blocks do not form data races with each

other. However, they may form data races with operations not executed within any synchronized or

atomic block. As usual, programs with data races have undefined semantics.


Exceptions When an exception is thrown but not caught within an atomic block, the effects of operations

executed within the block may take effect or be discarded, or std::abort may be called. This

behavior is specified by an additional keyword in the atomic block statement, as described in Section 4.

An atomic block whose effects are discarded is said to be canceled. An atomic block that completes

without its effects being discarded, and without calling std::abort, is said to be committed.


Transaction-safety As mentioned above, transaction-unsafe operations are prohibited within an atomic

block. This restriction applies not only to code in the body of an atomic block, but also to code in the

body of functions called (directly or indirectly) within the atomic block. To support static checking of

this restriction, we introduce a keyword to declare that a function or function pointer is transaction-safe, and augment the type of a function or function pointer to specify whether it is transaction-safe.

We also introduce an attribute to explicitly declare that a function is not transaction-safe.

To reduce the burden of declaring functions transaction-safe, a function is assumed to be transaction-safe if its definition does not contain any transaction-unsafe code and it is not explicitly declared

transaction-unsafe. Furthermore, unless declared otherwise, a non-virtual function whose definition

is unavailable is assumed to be transaction-safe. (This assumption does not apply to virtual functions

because the callee is not generally known statically to the caller.) These assumptions are checked at

link time.

synchronized {<compound-statement> }

Synchronized blocks are intended in part to address some of the difficulties with using mutexes for synchronizing memory access by raising the level of abstraction and providing

greater implementation flexibility. (See Generic Programming Needs Transactional Memory by Gottschlich

and Boehm in Transact 2013 for a discussion of some of these issues.) With synchronized blocks, a programmer need not associate locks with memory locations, nor obey a locking discipline to avoid deadlock.

Deadlock cannot occur if synchronized blocks are the only synchronization mechanism used in a program.

Although synchronized blocks can be implemented using a single global mutex, we expect that some

implementations of synchronized blocks will exploit recent hardware and software mechanisms for transactional memory to improve performance relative to mutex-based synchronization. For example, threads

may use speculation and conflict detection to evaluate synchronized blocks concurrently, discarding speculative outcomes if conflict is detected. Programmers should still endeavor to reduce the size of synchronized

blocks and the conflicts between synchronized blocks: poor performance is likely if synchronized blocks

are too large or concurrent conflicting evaluations of synchronized blocks are common. In addition, certain

operations, such as I/O, cannot be executed speculatively, so their use within synchronized blocks may hurt

performance.

atomic_noexcept | atomic_commit | atomic_cancel {<compound-statement> }

Atomic blocks are intended in part to replace many uses of mutexes for synchronizing memory access, simplifying the code and avoiding many problems introduced by mutexes (e.g.,

deadlock). We expect that some implementations of atomic blocks will exploit hardware and software transactional memory mechanisms to improve performance relative to mutex-based synchronization. Nonetheless, programmers should still endeavor to reduce the size of atomic blocks and the conflicts among atomic

blocks and with synchronized blocks: poor performance is likely if atomic blocks are too large or concurrent

conflicting executions of atomic and synchronized blocks are common.

The keyword following atomic is the atomic block’s exception specifier. It specifies the behavior when an

exception escapes the transaction:

atomic noexcept: This is undefined behavior and is not allowed; no side effects of the transaction can

be observed.

atomic commit: The transaction is committed and the exception is thrown.

atomic cancel: If the exception is transaction-safe (defined above), the transaction is canceled and

the exception is thrown. Otherwise, it is undefined behavior. In either case, no side effects of the

transaction can be observed.

Now for an example.

Below we show an attempt to use locks for generic programming, and explain a fundamental problem with

it. After that, we show how the same problem can be elegantly solved using transactions. These examples

are based on examples in Generic Programming Needs Transactional Memory by Justin Gottschlich and

Hans Boehm (TRANSACT 2013).


template <typename T>             

class log {                                

class concurrent_sack               

public:                                    

{                                                      

...                                     

public:                                             

void add(string const &s) {             

  ...                                                      

lock_guard<recursive_mutex> _(m_);   

 void set(T const &obj) {                   

l_ += s;                             

  lock_guard<mutex> _(m_);     

}                                       

 item_ = obj;                              

void lock() { m_.lock(); }              

 }                                                   

void unlock() { m_.unlock(); }          

  T const & get() const {             

private:                                   

lock_guard<mutex> _(m_);       

recursive_mutex m_;                     

return item_;                               

string l_;                              

  }                                                 

} L;                                       

private:

  T item_;

  mutex m_;

};

 

class T {

public:

  ...

  T& operator=(T const &rhs) {

     if (!check_invariants(rhs))

        { L.add("T invariant error"); }

  }

  bool check_invariants(T const& rhs)

  { return /*type-specific check*/; }

  string to_str() const { return "..."; }

};

Given the declarations above, the following program results in deadlock. There is no way to order the

locks to avoid this.

// Globally define sack

concurrent_sack<T> sack;

 

Thread 1                                                Thread 2

--------                                                     --------

                                                                // acquires L.m_

                                                                lock_guard<log> _(L);

// acquires sack::m_

sack.set(T());

                                                                // tries to acquire sack::m_

                                                                // (deadlock)

                                                                L.add(sack.get().to_str());

                                                                L.add("...");

// tries to acquire L.m_ (deadlock)

// if T::operator==()’s call to

// check_invariants() returns false

 

Next we revisit the same problem using transactions.


template <typename T>

class concurrent_sack

{

public:

  ...

  void set(T const &obj) {

     atomic_cancel { item_ = obj; }

  }

  T const & get() const {

     atomic_cancel { return item_; }

  }

private:

  T item_;

};


class log {

public:

  ...

  void add(string const &s) {

     atomic_cancel { l_ += s; }

  }

private:

  string l_;

} L;

class T {

public:

  ...

  T& operator=(T const &rhs) {

      if (!check_invariants(rhs))

      { L.add("invariant error"); }

  }

  bool check_invariants(T const& rhs)

  { return /*type-specific check*/; }

  string to_str() const { return "..."; }

};


With these declarations, the problem can be solved as follows. Note that the order in which the transactions are invoked does not matter, because no named locks are involved that could be misordered leading to

deadlock as shown in the prior example.


Instead, transactions are used for this generic programming example enabling the generic programmer

to build the system the way he or she believes it should be built, without leaking the implementation details

to the end programmer.


Likewise, the end programmer can program in the most natural fashion for him or her without worrying

about violating some embedded locking order within the generic programming code that he or she is using.

// Globally define sack

concurrent_sack<T> sack;

 

Thread 1                                 Thread 2

--------                                      --------

                                                // begins local transaction

                                                atomic_cancel

                                                {

// begins sack transaction

sack.set(T());


// begins L transaction if

// T::operator=()’s call to

// check_invariants()

// returns false

                                                     // begins sack transaction,

                                                     // then L transaction

                                                     L.add(sack.get().to_str());

                                                     L.add("...");

                                                  }