Hands-on – Italian C++ Community https://www.italiancpp.org Mon, 24 Aug 2020 13:03:53 +0000 it-IT hourly 1 https://wordpress.org/?v=4.7.18 106700034 Singleton revisited https://www.italiancpp.org/2017/03/19/singleton-revisited-eng/ https://www.italiancpp.org/2017/03/19/singleton-revisited-eng/#comments Sun, 19 Mar 2017 15:52:43 +0000 http://www.italiancpp.org/?p=7740 It happens quite often to deal with those annoying objects, used everywhere in your applications. This may be because of a poorly designed application but, sometime, it is simply an inevitable condition. With this in mind and considering that either a design refactoring to remove the global dependence is, sometimes, not affordable because of business needs or not convenient because it leads to more complex code, I’ll try to propose a non-canonical solution to the problem.

The obvious solution is the global scope, but we all know that it is also something which is better to avoid if you don’t want to run into sneaky problems, easy to cause and way more complicated to solve. We learned to be wary of the use of the global scope, as it is indeed a bad programming practice often causing long debugging sessions. But the problem still remains as it is at least useful (sometime necessary) to have variables valid within our whole program.

To bring to light the most relevant cons concerning the use of the global scope, we need to consider that its use involves the impossibility to count on data consistency. Furthermore, those variables can be modified at any time without us, as programmers, being really aware of it.

The global sharing can be a necessity, that’s why a standard pattern like Singleton exists. Ok, this pattern does something more, enforcing the uniqueness of a given type instances, however, at least as far as my own experience is concerned, the use of Singleton is often justified by the will of being able to easily refer to a unique object from any point effortlessly.

I’ll try to describe an alternative which, from my perspective, can work better.

A typical case

A typical example is the implementation of a logger to be used application wide. If you don’t want to pass that object around, as a parameter in each and every function and constructor, you’ll probably implement that object as a Singleton.

At a first glance, it is a good idea, it must be unique and shared. This snippet below is a quite standard implementation:

class Logger{

  Logger() = default;

  public:

  static Logger& get(){
    static Logger instance;
    return instance;
  }

  void log(std::string s){
    std::cout << s << std::endl;
  }
};

Ok, this is quite nice, it achieves the result and also avoid troubles with resource acquisition and release, which you may have defining the instance as a private class member.

But the Singleton is a weak choice because it enforces the uniqueness of the object for the whole application life and this is not what we really want.

Let’s suppose that we need to log the activity of a really long function, and that we want to log to a different file every time the function is executed. (In the following examples I’m omitting some details for simplicity).

void g(){ 
   singleton::Logger::get().log("do something");
}

void f(){
  singleton::Logger::get().start();

  singleton::Logger::get().log("call g()");
  g();
  singleton::Logger::get().end();
}

start and end methods are there to open and close a new file to log to. It works decently but it’s probably not the most beautiful code you’ve ever seen. Too many things can go wrong once that snippet become part of a much more complex application.

Let’s suppose that the functions you need to log the activity are many, for each of them you need to insert that calls to open and close the file to ensure other function are logged the right way. What if you forgot to close the file at the end of one of those functions? The next call would open a new log file leaving the old one open. And what if in the next call you don’t even open the new file? It will continue to write to the old one causing an unexpected behaviour.

As programmers our task is to write error-less code and, as it is a hard one, the best way is to design things so that they are not error prone.

In the example above, the first reference to the Singleton creates it but still does not really initialize it. At that point, even if we have the object, we can’t use it but after some preliminary action. Moreover, we’d like not to be able to refer to a resource once it is released, as it happens after the file is closed in the example. Ultimatly, the problem, in that example, is that the Singleton is doing more than what we really want it to do. The logger must not be unique and accessible during the whole application life but only for a part.

This Singleton’s limit has become clear to me when I was asked to put such a resource under test. In that case it was not a logger but a global component that, once under test, was to become local to the test. In practice I had a global (an exotic one implemented as a Singleton but still a global) to handle. That led me to try some variation on a theme, till I realized that I was searching for something like dynamic scope.

C, C++, Java, C# and many other languages, bind names to addresses using the static scope mechanism. That is, statically looking at the code, it is possible to decide which is the definition a variable name is referring.

int i;

void f(){
  int i=9;
  b();
  // here i refers to the local variable
}

void b(){
  // here i refers to the variable in global scope
   
}

void a(){
  // here i refers to the variable in global scope
  i= 3;
  b();
  f();
}

Using dynamic binding, instead, the variable a symbol refers to depends on the current call stack. In the example above, the name refers to the global or the local defined in f() depending on the call stack that lead to the execution of b().

If we had something like the dynamic scope, we could define a new object each and every time the need for a new resource arises, share it thereafter as it was a global untill the function which acquired the resource terminates and the resource itself is released

In the example below things are a little complicated to make it possible to define into dynamic scope any type even if it requires some parameter to be constructed:

template <typename T>
struct dynamic_scope {
  
  static std::stack<T>& instances(){
    static typename std::stack<T> m_instances;
    return m_instances;
  }

  static T& instance() {
    return instances().top();
  }
                     
  template <typename... Args>
  dynamic_scope(Args... args){
    instances().push(T(args...));
  }

  ~dynamic_scope(){
    instances().pop();
  }
};

void f2(){
  dynamic_scope<Console> the_console (Console(40L, 3L));
  dynamic_scope<Console>::instance() << "log from f2"; 
}; 

void f1(){  
  dynamic_scope<Console>::instance() << "log from f1"; 
  f2(); 
}; 

int _tmain(int argc, _TCHAR* argv[]) { 
  dynamic_scope<Console> the_console (40L, 3L);
  dynamic_scope<Console>::instance() << "log something"; 
  f1(); 
  
  return 0; 
}

What I did here is to enforce the uniqueness of a stack of instances. Every instance override previous one once it is defined and untill it is released.

In the example, main function and f1 log their data in a different Console then the more specific one that f2 uses. Moreover every call to f2 causes the acquisition of a new Console object. Every each time a new dynamic_scope object is created, it is pushed on the static stack. The instance in use is always the one on top of the stack, it will be popped  and destroyed once the corresponding dynamic_scope object comes out of its scope.  At that point, the last overridden instance become the current in use again. The name given to the variable that manages the dynamic scope is not important.

A little programming puzzle can help clarify even more. What’s the output of the following program?

#include <iostream>
#include <stack>

template <typename T>
struct dynamic_scope {
  
  static std::stack<T>& instances(){
    static typename std::stack<T> m_instances;
    return m_instances;
  }

  static T& instance() {
    return instances().top();
  }
                     
  template <typename... Args>
  dynamic_scope(Args... args){
    instances().push(T(args...));
  }

  ~dynamic_scope(){
    instances().pop();
  }
};

void f2(){
  std::cout << ++dynamic_scope<long>::instance() << std::endl;
};

void f1(){
  dynamic_scope<long> the_long(10);
  std::cout << ++dynamic_scope<long>::instance() << std::endl;

  f2();
};

int main(int argc, char* argv[]){
  dynamic_scope<long> the_long(80);

  std::cout << dynamic_scope<long>::instance()  << std::endl;
  f1();
  std::cout << dynamic_scope<long>::instance() << std::endl;
  
  return 0;
}

Here I defined a unique instance of and integer on the dynamic scope. When the program enter main function, one instance of the integer is created on the dynamic scope. Its value is 80 and that is the value sent to stdout.

When the program enter f1(), the integer instance defined in its scope override the previous one becoming the current in use. Its initial value is 10 which is increased by one, so 11, and is sent to the stdout before f2() get called.

In the scope of f2() function, the integer is accessible and it refers to the last instance defined in f1() therefore a new increment makes its value be 12 which is the value printed in f2().

When calls are popped from the call stack and the execution return within main function, the integer instance defined in main() itself is restored as the current in use so that the last instruction still print 80.

Limits

I don’t think this is really a dynamic scope implemented in C++ because we override variables based on their type not their names. We always refer to the unique active instance of type so we will never refer to two different objects of the same type at the same time if they are managed by the “dynamic scope”.

Hope this notes of mine will be useful for anyone.

Write down your idea is a great thing

And it is even better if someone review you.

After I wrote this and asked for a review, the reviewer has pointed out that someone had similar proposal. Bob Smidth, the author of this article published on ACCU-2085, had similar motivations to arrange a design and write his proposal.

Aside the use of the stack that adds features and try to highlight how the little forgotten  dynamic scope concept can be a good solution for a common problem, the approach I proposed here is very close to the one published by Bob. To me, the difference, is in a different compromise between invasiveness and use friendliness.

The tool I wrote here is not the most terse (I mean it is too verbose) but, on the plus side, it makes clear that we are accessing variables that do not follow common scope rules.

I mentioned the invasiveness because I think that the pattern proposed by Bob has the requirement to wrap every interface you want to be a Singleton (or with some modification in the dynamic scope) into a mono_nvi wrapper. This simplify the usage for sure but at the cost of boilerplate.

Anyway the two solutions are really similar, to switch to something really similar to the one in  ACCU-2085, you need just to write that small piece of boilerplate that define a temporary used to access the instance on top of the global stack. This code try to synthesize both approaches:

#include <iostream>
#include <stack>

struct ilogger {
  virtual void info(const char* msg) const = 0;
};

struct logger : public ilogger {
  std::string m_name;
  
  logger(std::string logger_name) : m_name(logger_name){}
  
  virtual void info(const char* msg) const {
    std::cout << m_name.c_str() << ":  " <<  msg << std::endl;
  }
};

template <typename T>
struct dynamic_scope {

  static std::stack<T>& instances() {
    static typename std::stack<T> m_instances;
    return m_instances;
  }

  static T& instance() {
    return instances().top();
  }

  template <typename... Args>
  dynamic_scope(Args... args) {
    instances().push(T(args...));
  }

  ~dynamic_scope() {
    instances().pop();
  }


  struct entry {
    T& m_obj;
    
    entry() : m_obj(dynamic_scope::instance()) { }

    void info(const char* msg) const {
      m_obj.info(msg);
    }
  };

};



void f2() {
  dynamic_scope<logger>::entry the_logger;

  dynamic_scope<logger>::instance().info("from f2");
  the_logger.info("from f2 (simpler?");
}

void f1() {
  dynamic_scope<logger> dynamic_logger("f1");
  dynamic_scope<logger>::entry the_logger;
  
  dynamic_scope<logger>::instance().info("call f2");
  the_logger.info("call f2 (simpler?)");

  f2();
};

int main(int argc, char* argv[]) {
  dynamic_scope<logger> dynamic_logger("main");
  dynamic_scope<logger>::entry the_logger;

  dynamic_scope<logger>::instance().info("call f1");
  the_logger.info("call f1 (simpler?)");

  f1();
  dynamic_scope<logger>::instance().info("returned from f1");
  the_logger.info("returned from f1 (simpler?)");

  return 0;
}

What if you are in a multithread world?

I don’t want to try a solution for Singleton in the multithread realm, but want just to warn about the dangers of such a solution applied in a multithreaded application.

For sure, even if the underlying object is thread safe, you need to synchronize the access to the shared global stack. You can do it using the inner structure entry: integrate a std::lock_guard  into it and be careful to not keep the temporary alive too long. But is a shared dynamic scope really meaningful? To me it is a stretch.

]]>
https://www.italiancpp.org/2017/03/19/singleton-revisited-eng/feed/ 4 7740
Spiare il consumo di memoria con l’operatore new https://www.italiancpp.org/2017/01/27/spiare-il-consumo-di-memoria-con-loperatore-new/ Fri, 27 Jan 2017 18:42:17 +0000 http://www.italiancpp.org/?p=7318 .inlineNote {border-style: solid; border-radius: 5px; border-color: rgb(39,​ 48,​ 57); border-width: 2px;} p {text-align: justify !important;} .inlineCode {font-family: "Courier New", Courier, monospace;}

Un grazie speciale a Marco Alesiani per le sue correzioni e suggerimenti.

International reader? Read the post in English.


Quando diciamo “efficienza”, quasi sempre pensiamo “tempo”. Prima il codice fa il suo lavoro, più è efficiente.

E la memoria? Certo, oggi anche un portatile da quattro soldi arriva con “un secchio di RAM“… ma non basta mai. Il mio PC “sperpera” 1.4GB solo per restare acceso. Apro un browser, altri 300MB che se ne vanno*.

…e chiediamo scusa per gli errori “Allowed memory size of … bytes exhausted “ o le pagine bianche che potreste vedere ogni tanto su ++It. Capite perchè il tema “memoria” ci sta a cuore.

Oltre il danno, la beffa: usare la memoria è anche una delle operazioni più lente sui sistemi attuali*.

Ma non è semplice capire a quale riga del codice dare la colpa. Le new che scriviamo noi stessi? Qualche allocazione nascosta in una libreria? O è colpa di oggetti temporanei?

Come trovare facilemente la parte di codice che usa più memoria?

Questo articolo raccoglie qualche esperimento personale. Tutti gli errori sono “merito” dell’autore.

Usiamo un po’ di memoria

Il programma-giocattolo di oggi non ha nulla di particolare, se non una gran varietà di allocazioni di memoria con operator new.


/* Programma che alloca memoria a casaccio.
Niente delete, questo non e’ un articolo sui memory leak.*/
#include <string>
#include <memory>
#include <boost/shared_ptr.hpp>
#include <boost/make_shared.hpp>
#include "UnaClasseDelProgramma.h"

//
void h() {
UnaClasseDelProgramma * t = new UnaClasseDelProgramma();
}
void g() { h(); }
void f() { g(); }
void CreaUnaClasseDelProgramma() { f(); }

//
int main(int argc, char **argv) {
int * numero = new int(89);
std::string * test = new std::string("abc");
//
UnaClasseDelProgramma * oggetto = new UnaClasseDelProgramma();
CreaUnaClasseDelProgramma();
//
boost::shared_ptr<UnaClasseDelProgramma> smartPointer = boost::make_shared<UnaClasseDelProgramma>();
std::shared_ptr<UnaClasseDelProgramma> stdSmartPointer = std::make_shared<UnaClasseDelProgramma>();
return 0;
}

Compila, apri e… circa 42MB (misurati “alla buona” con /usr/bin/time -v).

Chi consuma tutta questa memoria?

Il modo corretto: memory profiler

Il concetto è familiare: il profiler “classico” indica per quanto tempo gira ogni funzione. Il memory profiler invece indica dove, quando e quanta memoria usa il programma.
Per esempio, ecco una parte di quello che Massif * dice del nostro programma.

Per iniziare, otteniamo (in ASCII art!) come l’uso della memoria cresce nel “tempo” – in realtà come cresce col numero di istruzioni eseguite:

    MB
38.23^                                                           ::::::::::::#
     |                                                           :           #
     |                                                           :           #
     |                                                           :           #
     |                                                           :           #
     |                                               :::::::::::::           #
     |                                               :           :           #
     |                                               :           :           #
     |                                               :           :           #
     |                                               :           :           #
     |                                   @@@@@@@@@@@@:           :           #
     |                                   @           :           :           #
     |                                   @           :           :           #
     |                                   @           :           :           #
     |                                   @           :           :           #
     |                       ::::::::::::@           :           :           #
     |                       :           @           :           :           #
     |                       :           @           :           :           #
     |                       :           @           :           :           #
     |                       :           @           :           :           #
   0 +----------------------------------------------------------------------->Mi
     0                                                                   6.203

Poi dei resoconti più dettagliati (le annotazioni “A”, “B” e “C” sono nostre):

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
...
  9      4,313,116       30,080,056       30,072,844         7,212            0
99.98% (30,072,844B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->99.73% (30,000,000B) 0x407F68: __gnu_cxx::new_allocator<char>::allocate(unsigned long, void const*) (new_allocator.h:104)
| ->99.73% (30,000,000B) 0x407EDA: std::allocator_traits<std::allocator<char> >::allocate(std::allocator<char>&, unsigned long) (alloc_traits.h:491)
|   ->99.73% (30,000,000B) 0x407E80: std::_Vector_base<char, std::allocator<char> >::_M_allocate(unsigned long) (stl_vector.h:170)
|     ->99.73% (30,000,000B) 0x407DFB: std::_Vector_base<char, std::allocator<char> >::_M_create_storage(unsigned long) (stl_vector.h:185)
|       ->99.73% (30,000,000B) 0x407D27: std::_Vector_base<char, std::allocator<char> >::_Vector_base(unsigned long, std::allocator<char> const&) (stl_vector.h:136)
|         ->99.73% (30,000,000B) 0x407CB6: std::vector<char, std::allocator<char> >::vector(unsigned long, std::allocator<char> const&) (stl_vector.h:278)
|           ->99.73% (30,000,000B) 0x407C45: UnaClasseDelProgramma::UnaClasseDelProgramma() (UnaClasseDelProgramma.cpp:4)
|   A ===>   ->33.24% (10,000,000B) 0x406611: main (main.cpp:20)
|             | 
|   B ===>    ->33.24% (10,000,000B) 0x406541: h() (main.cpp:10)
|             | ->33.24% (10,000,000B) 0x40656F: g() (main.cpp:12)
|             |   ->33.24% (10,000,000B) 0x40657B: f() (main.cpp:13)
|             |     ->33.24% (10,000,000B) 0x406587: CreaUnaClasseDelProgramma() (main.cpp:14)
|             |       ->33.24% (10,000,000B) 0x40661A: main (main.cpp:21)
|             |         
|   C ===>    ->33.24% (10,000,000B) 0x406A72: _ZN5boost11make_sharedI21UnaClasseDelProgrammaIEEENS_6detail15sp_if_not_arrayIT_E4typeEDpOT0_ (make_shared_object.hpp:254)
|               ->33.24% (10,000,000B) 0x406626: main (main.cpp:23)
|                 
->00.24% (72,844B) in 1+ places, all below ms_print's threshold (01.00%)

Vediamo subito che un terzo della memoria si spende alla riga 20 del main (A), dove c’è uno dei nostri new. Un altro 30% (B) lo alloca h() – che Massif mostra nello stack delle chiamate registrato al momento dell’allocazione. Seguendolo arriviamo alla chiamata a CreaUnaClasseDelProgramma() nel main. Massif cattura anche le allocazioni con shared pointer (C).

L’allocazione alla riga 24 non si vede perchè non è stata ancora eseguita e “intercettata” da Massif. Potrebbe comparire in uno snapshot successivo. Le altre allocazioni nel main sono “piccole” e aggregate nell’ultima riga.

Si vede subto che è il caso di dare un’occhiata al costruttore di UnaClasseDelProgramma. Che farà mai con uno std::vector che occupa il 99% della memoria?

Questo è già un ottimo aiuto, con poco sforzo. Volendo, Massif può fare di più. Può misurare la memoria usata “di nascosto” dal sistema per gestire l’heap (extra-heap – 7,212 byte nell’esempio), misurare lo stack…

Il metodo fai-da-te: override di operator new

In C++ si può sostituire l’operazione di creazione di un oggetto (new) con la propria.*

Quasi nessuno ha una buona ragione per farlo, ma noi si: non sappiamo usare il profiler intercettare le allocaioni nello heap.

Semplificando, basta definire la nostra versione di operator new (e dei suoi overload) in qualunque file del programma.

Se il memory profiler equivale al “time” profiler, questo trucco è paragonabile al classico snippet cout << tempoFine - tempoInizio;. Non magnificamente dettagliato e accurato, ma semplice e comunque utile.

Bastano poche righe di codice per avere qualcosa di rozzo, ma utilizzabile. E’ meglio compilare con i simboli di debug. Il codice per scrivere lo stack trace è valido probabilmente solo su Linux*.


Non c’è niente di portabile a così basso livello.

Per chi lavora nel mondo Microsoft: https://msdn.microsoft.com/en-us/library/windows/desktop/bb204633%28v=vs.85%29.aspx.

Sarebbe a dire:


#include <iostream>
//
#include <Windows.h> // Cattura degli stack trace.
#include <Dbghelp.h> // Lettura simboli di debug.

//
void StackTrace() {
/* Cattura lo stack trace vero e proprio. */
const ULONG doNotSkipAnyFrame = 0;
const ULONG takeTenFrames = 10;
const PULONG doNotHash = nullptr;
PVOID stackTrace[takeTenFrames];
const USHORT framesCaptured = CaptureStackBackTrace(
doNotSkipAnyFrame,
takeTenFrames,
stackTrace,
doNotHash
);
//
/* Prepara la tabella dei simboli per tradurre da indirizzi a righe di codice. */
const HANDLE thisProcess = GetCurrentProcess();
SymInitialize(thisProcess, NULL, TRUE); // Linkare Dbghelp.lib
//
for (ULONG i = 0; i < framesCaptured; i++) {
/*Estrae il nome della funzione. */
const size_t nameStringSize = 256;
SYMBOL_INFO * functionData = (SYMBOL_INFO*)malloc(sizeof(SYMBOL_INFO) + (nameStringSize + 1) * sizeof(char)); // +1 per il \0
functionData->MaxNameLen = nameStringSize;
functionData->SizeOfStruct = sizeof(SYMBOL_INFO);
SymFromAddr(thisProcess, (DWORD64)(stackTrace[i]), 0, functionData);
//
/* Va a cercare il file corrispondende alla chiamata.*/
DWORD displacementInLine;
IMAGEHLP_LINE64 lineOfCode;
lineOfCode.SizeOfStruct = sizeof(IMAGEHLP_LINE64);
SymGetLineFromAddr64(thisProcess, (DWORD)(stackTrace[i]), &displacementInLine, &lineOfCode);
//
std::cout << functionData->Name << " at "
<< lineOfCode.FileName << ":" << lineOfCode.LineNumber << std::endl;
}
}

.


// Il nostro new deve poter allocare la memoria…
#include <cstdio>
#include <cstdlib>
// …ma anche ispezionare lo stack e salvarlo in output.
#include <execinfo.h>
#include <unistd.h>
#include <fstream>
// Contiene std::bad_alloc – da lanciare in caso di errori.
#include <new>
//
/* Apre (una sola volta) e restituisce il file stream per salvare
gli stack. */
std::ofstream& filePerRisultati() {
static std::ofstream memoryProfile;
static bool open = false; // Init on 1st use, classico.
if (! open) {
memoryProfile.open ("allocations.txt");
open = true;
}
// Else, gestire gli errori, chiudere il file…
// Omettiamo per semplicità.
return memoryProfile;
}
//
/* Questa funzione “fa la magia” e scrive nel file lo stack trace al momento della chiamata
(compreso il suo stesso frame). */
void segnaLoStackTrace(std::ofstream& memoryProfile) {
// Registriamo 15 puntatori agli stack frame (bastano per il programma di prova).
const int massimaDimensioneStack = 15;
void *callStack[massimaDimensioneStack];
size_t frameInUso = backtrace(callStack, massimaDimensioneStack);
// A questo punto callStack è pieno di puntatori. Chiediamo i nomi delle
// funzioni corrispondenti a ciascun frame.
char ** nomiFunzioniMangled = backtrace_symbols(callStack, frameInUso);
// Scrive tutte le stringhe con i nomi delle funzioni nello stream per il debug.
for (int i = 0; i < frameInUso; ++i)
memoryProfile << nomiFunzioniMangled[i] << std::endl;
// A essere precisi, dovremmo rilasciare nomiFunzioniMangled con free…
}
//
/* Finalmente abbiamo tutti gli elementi per costruire il nostro operator new. */
void* operator new(std::size_t sz) {
// Allochiamo la memoria che serve al chiamante.
void * memoriaRichiesta = std::malloc(sz);
if (! memoriaRichiesta)
throw std::bad_alloc();

// Raccontiamo al mondo intero le nostre allocaioni.
std::ofstream& memoryProfile = filePerRisultati();
memoryProfile << "Allocation, size = " << sz << " at " << static_cast<void*>(memoriaRichiesta) << std::endl;
segnaLoStackTrace(memoryProfile);
memoryProfile << "-----------" << std::endl; // Separatore dei poveri…
return memoriaRichiesta;
}

Aggiungiamo l’operator new “taroccato” al nostro programma di prova. Questo è un esempio del risultato – riuscite a capire quale riga di codice alloca la memoria?

Allocation, size = 40 at 0x18705b0
./overridenew(_Z14dumpStackTraceRSt14basic_ofstreamIcSt11char_traitsIcEE+0x3c) [0x40672c]
./overridenew(_Znwm+0xaf) [0x406879]
./overridenew(_ZN9__gnu_cxx13new_allocatorISt23_Sp_counted_ptr_inplaceI9SomeClassSaIS2_ELNS_12_Lock_policyE2EEE8allocateEmPKv+0x4a) [0x405d9e]
./overridenew(_ZNSt16allocator_traitsISaISt23_Sp_counted_ptr_inplaceI9SomeClassSaIS1_ELN9__gnu_cxx12_Lock_policyE2EEEE8allocateERS6_m+0x28) [0x405bef]
./overridenew(_ZSt18__allocate_guardedISaISt23_Sp_counted_ptr_inplaceI9SomeClassSaIS1_ELN9__gnu_cxx12_Lock_policyE2EEEESt15__allocated_ptrIT_ERS8_+0x21) [0x4059e2]
./overridenew(_ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EEC2I9SomeClassSaIS4_EJEEESt19_Sp_make_shared_tagPT_RKT0_DpOT1_+0x59) [0x4057e1]
./overridenew(_ZNSt12__shared_ptrI9SomeClassLN9__gnu_cxx12_Lock_policyE2EEC2ISaIS0_EJEEESt19_Sp_make_shared_tagRKT_DpOT0_+0x3c) [0x4056ae]
./overridenew(_ZNSt10shared_ptrI9SomeClassEC2ISaIS0_EJEEESt19_Sp_make_shared_tagRKT_DpOT0_+0x28) [0x40560e]
./overridenew(_ZSt15allocate_sharedI9SomeClassSaIS0_EIEESt10shared_ptrIT_ERKT0_DpOT1_+0x37) [0x405534]
./overridenew(_ZSt11make_sharedI9SomeClassJEESt10shared_ptrIT_EDpOT0_+0x3b) [0x405454]
./overridenew(main+0x9c) [0x4052e8]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f83fe991830]
./overridenew(_start+0x29) [0x405079]
-----------
Allocation, size = 10000000 at 0x7f83fc9c3010
./overridenew(_Z14dumpStackTraceRSt14basic_ofstreamIcSt11char_traitsIcEE+0x3c) [0x40672c]
./overridenew(_Znwm+0xaf) [0x406879]
./overridenew(_ZN9__gnu_cxx13new_allocatorIcE8allocateEmPKv+0x3c) [0x406538]
./overridenew(_ZNSt16allocator_traitsISaIcEE8allocateERS0_m+0x28) [0x4064aa]
./overridenew(_ZNSt12_Vector_baseIcSaIcEE11_M_allocateEm+0x2a) [0x406450]
./overridenew(_ZNSt12_Vector_baseIcSaIcEE17_M_create_storageEm+0x23) [0x4063cb]
./overridenew(_ZNSt12_Vector_baseIcSaIcEEC1EmRKS0_+0x3b) [0x4062f7]
./overridenew(_ZNSt6vectorIcSaIcEEC2EmRKS0_+0x2c) [0x406286]
./overridenew(_ZN9SomeClassC1Ev+0x3d) [0x406215]
./overridenew(_ZN9__gnu_cxx13new_allocatorI9SomeClassE9constructIS1_JEEEvPT_DpOT0_+0x36) [0x405e3a]
./overridenew(_ZNSt16allocator_traitsISaI9SomeClassEE9constructIS0_JEEEvRS1_PT_DpOT0_+0x23) [0x405d51]
./overridenew(_ZNSt23_Sp_counted_ptr_inplaceI9SomeClassSaIS0_ELN9__gnu_cxx12_Lock_policyE2EEC2IJEEES1_DpOT_+0x8c) [0x405b4a]
./overridenew(_ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EEC2I9SomeClassSaIS4_EJEEESt19_Sp_make_shared_tagPT_RKT0_DpOT1_+0xaf) [0x405837]
./overridenew(_ZNSt12__shared_ptrI9SomeClassLN9__gnu_cxx12_Lock_policyE2EEC2ISaIS0_EJEEESt19_Sp_make_shared_tagRKT_DpOT0_+0x3c) [0x4056ae]
./overridenew(_ZNSt10shared_ptrI9SomeClassEC2ISaIS0_EJEEESt19_Sp_make_shared_tagRKT_DpOT0_+0x28) [0x40560e]

...

…io non ci riesco. Dove sta “main+0xa8” nel mio programma? Fortunatamente, nel “mondo gnu/Linux” ci sono strumenti per fare il de-mangling e trovare i punti del codice corrispondenti agli indirizzi. Possiamo usarli, per esempio, in un semplice script.


#!/usr/bin/python
#
# C++filt fa il demangling dei nomi.
#
# addr2line converte i puntatori a codice (es. indirizzi di funzioni)
# alla coppia file:riga col codice corrispondente (se ci sono i simboli di debug).
#
# Il codice python dovrebbe essere portabile, ma non le utility a riga di comando.
#

import re
import subprocess
#

# Apre un sottoprocesso e gli passa dei comandi per la shell, poi ritorna il risultato in una stringa.
# Non molto efficiente, ma semplice.
def run_shell(command):
return subprocess.Popen(command, stdout=subprocess.PIPE).communicate()[0]
#
#
if __name__ == “__main__”:
total_size = 0;
#
# L’output ha 2 tipi di righe: quella con la dimensione dell’allocazione, e quella con uno stack frame.
size_line = re.compile(“Allocation, size = (\d+) at (\d+)”) # Allocation, size = <bytes> at <punto dell’heap>
stack_line = re.compile(“.*\((.*)\+.*\) \[(.*)\]”) # <immondizia>(nome mangled) [<puntatore al codice>]
#
allocations_file = open(“allocations.txt”)
for line in allocations_file:
match_size = size_line.match(line)
match_stack = stack_line.match(line)
#
# A scopo dimostrativo, accumulo il totale della memoria allocata.
# Un esempio di quello che si puo’ fare quando si controlla new!
if (match_size):
allocation_size = int(match_size.group(1))
total_size += allocation_size
print “Allocati ” + str(allocation_size)
#
elif (match_stack):
mangled_name = match_stack.group(1)
line_address = match_stack.group(2)
demangled_name = run_shell(["c++filt", "-n", mangled_name])
line_number = run_shell([“addr2line", “-e”, “./overridenew”, line_address])
#
# La formattazione non e’ molto professionale. Il -1 "gratuito" e’ per togliere un newline.
print”\t” + demangled_name[:-1] + “\n\t\t” + line_number,
#
# Rimette i separatori esattamente dov’erano.
else:
print line
#
print “\n total allocated size ” + str(total_size)

In alternativa, si può fare tutto a run time, con le utility di demangling dei compilatori. Per esempio quella di gcc. Personalmente preferisco tenere il codice di misurazione il più semplice possibile e “sbrigarmela” off-line. Con il mio script ottengo:

Allocati 40
    segnaLoStackTrace(std::basic_ofstream<char, std::char_traits<char> >&)
        /home/stefano/projects/overrideNew/InstrumentedNew.cpp:31
    operator new(unsigned long)
        /home/stefano/projects/overrideNew/InstrumentedNew.cpp:51
    __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<UnaClasseDelProgramma, std::allocator<UnaClasseDelProgramma>, (__gnu_cxx::_Lock_policy)2> >::allocate(unsigned long, void const*)
        /usr/include/c++/5/ext/new_allocator.h:105

   ... stack delle chiamate "interne" di shared_ptr...

    std::shared_ptr<UnaClasseDelProgramma> std::allocate_shared<UnaClasseDelProgramma, std::allocator<UnaClasseDelProgramma>>(std::allocator<UnaClasseDelProgramma> const&)
        /usr/include/c++/5/bits/shared_ptr.h:620
    std::shared_ptr<UnaClasseDelProgramma> std::make_shared<UnaClasseDelProgramma>()
        /usr/include/c++/5/bits/shared_ptr.h:636
    main
        /home/stefano/projects/overrideNew/main.cpp:25
    __libc_start_main
        ??:0
    _start
        ??:?
-----------

Allocati 10000000
    segnaLoStackTrace(std::basic_ofstream<char, std::char_traits<char> >&)
        /home/stefano/projects/overrideNew/InstrumentedNew.cpp:31
    operator new(unsigned long)
        /home/stefano/projects/overrideNew/InstrumentedNew.cpp:51
    __gnu_cxx::new_allocator<char>::allocate(unsigned long, void const*)
        /usr/include/c++/5/ext/new_allocator.h:105

         ... stack delle chiamate interne di vector...

    std::vector<char, std::allocator<char> >::vector(unsigned long, std::allocator<char> const&)
        /usr/include/c++/5/bits/stl_vector.h:279
    UnaClasseDelProgramma::UnaClasseDelProgramma()
        /home/stefano/projects/overrideNew/UnaClasseDelProgramma.cpp:4 (discriminator 2)
...

La prima allocazione sono 40 byte chiesti da make_shared. 24 per UnaClasseDelProgramma (che contiene un vector come membro – sizeof(vector) è 24), i restanti dovrebbero essere il control block dello shared pointer. La seconda allocazione sono i 10MB del famigerato costruttore di UnaClasseDelProgramma.

Bisogna faticare un po’ per decifrare gli stack, ma si riesce a capire che la riga misteriosa era std::shared_ptr stdSmartPointer = std::make_shared<UnaClasseDelProgramma>(); – dalle parti del return a main.cpp:25.

Compito per casa: quante allocazioni ci sarebbero con std::shared_ptr<UnaClasseDelProgramma> notSoSmartPointer(new UnaClasseDelProgramma());
?
*

Tre, e si usano 8 byte in più.
In un test ho misurato:
24 byte per l’istanza di UnaClasseDelProgramma
10 MB per il contenuto del vector
24 byte per lo shared pointer.

Giudiacando dalle implementation notes, penso che la differenza sia nel contenuto del control_block dello shared pointer.


Riassumendo…

I programmatori combattono da sempre con la memoria, vuoi perché è poca, vuoi perché è lenta. Come per tutti i colli di bottiglia, non ci si può fidare dell’istinto. Abbiamo visto che esistono strumenti appropriati (i memory profiler) per misurare il consumo di memoria. Abbiamo scoperto che, male che vada, esistono strumenti “casarecci” che possiamo costruirci da soli con il “classico hack da C++”, manipolando operator new.

Trovate il codice degli esempi “pronto da compilare” sul repo GitHub di ++It.

]]>
7318
Spy your memory usage with operator new https://www.italiancpp.org/2017/01/27/spy-your-memory-usage-with-operator-new/ Fri, 27 Jan 2017 18:18:53 +0000 http://www.italiancpp.org/?p=7387 .inlineNote {border-style: solid; border-radius: 5px; border-color: rgb(39,​ 48,​ 57); border-width: 2px;} p {text-align: justify !important;} .inlineCode {font-family: "Courier New", Courier, monospace;}

Special thanks to Marco Alesiani for many corrections and suggestions.

Anche tu campi a spaghetti e pizza? Leggi l’articolo in italiano.


When we say “efficiency”, we often think “time”. The sooner the code does its job, the more it is efficient.

What about memory? Granted, today even the lousiest laptop comes with “a bucket load” of RAM which… is never enough. My PC “wastes” 1.4GB just to idle. I open a browser, 300 more MB are gone.*.

…we take the occasion to apologize for the “Allowed memory size of … bytes exhausted “ errors and the white pages that you may occasionally see on ++It. There is a reason why we care so much about memory.

Adding insult to injury, using memory is one of the slowest operations on current systems*.

(Italian only) Daniele Maccioni: Data Oriented Design: alte performance in C++

Moreover, finding the culprit line among the code is not easy. Was it a “new” we wrote? Some allocation hidden inside a library? Are temporary objects to blame?

How to easily find the part of the code that uses most of the memory?

This post collects some personal experiments. You can “thank” the author for any mistake.

Let’s use some memory

Today’s toy-code is nothing special, but it does many an allocation using operator new.


/* Program that allocates some memory when it feels like.
No delete – today’s essay is not about memory leaks.*/
#include <string>
#include <memory>
#include <boost/shared_ptr.hpp>
#include <boost/make_shared.hpp>
#include "SomeClass.h"
//
void h() {
SomeClass* t = new SomeClass();
}
void g() { h(); }
void f() { g(); }
void MakeSomeClass() { f(); }
//
int main(int argc, char **argv) {
int * number = new int(89);
std::string * test = new std::string("abc");
//
SomeClass * oggetto = new SomeClass();
MakeSomeClass();
//
boost::shared_ptr<SomeClass> smartPointer = boost::make_shared<SomeClass>();
std::shared_ptr<SomeClass> stdSmartPointer = std::make_shared<SomeClass>();
return 0;
}

Compile, run and… almost 42MB (measured “on the cheap” with /usr/bin/time -v).

Who is using all that memory?

The right way: memory profiler

The idea should be familiar: the “classic” profiler tells for how long each function executes. The memory profiler instead tells where and when the program uses memory, and how much.
For example, here is some of the information that Massif * returns about our program.

We can start with the memory growth (in ASCII art!) over “time” – actually its growth over the number of executed instructions:

    MB
38.23^                                                           ::::::::::::#
     |                                                           :           #
     |                                                           :           #
     |                                                           :           #
     |                                                           :           #
     |                                               :::::::::::::           #
     |                                               :           :           #
     |                                               :           :           #
     |                                               :           :           #
     |                                               :           :           #
     |                                   @@@@@@@@@@@@:           :           #
     |                                   @           :           :           #
     |                                   @           :           :           #
     |                                   @           :           :           #
     |                                   @           :           :           #
     |                       ::::::::::::@           :           :           #
     |                       :           @           :           :           #
     |                       :           @           :           :           #
     |                       :           @           :           :           #
     |                       :           @           :           :           #
   0 +----------------------------------------------------------------------->Mi
     0                                                                   6.203

Then we can get detailed snapshots (the “A”, “B” and “C” tags are ours):

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
...
  9      4,311,691       30,080,056       30,072,844         7,212            0
99.98% (30,072,844B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->99.73% (30,000,000B) 0x4078E8: __gnu_cxx::new_allocator<char>::allocate(unsigned long, void const*) (new_allocator.h:104)
| ->99.73% (30,000,000B) 0x40785A: std::allocator_traits<std::allocator<char> >::allocate(std::allocator<char>&, unsigned long) (alloc_traits.h:491)
|   ->99.73% (30,000,000B) 0x407800: std::_Vector_base<char, std::allocator<char> >::_M_allocate(unsigned long) (stl_vector.h:170)
|     ->99.73% (30,000,000B) 0x40777B: std::_Vector_base<char, std::allocator<char> >::_M_create_storage(unsigned long) (stl_vector.h:185)
|       ->99.73% (30,000,000B) 0x4076A7: std::_Vector_base<char, std::allocator<char> >::_Vector_base(unsigned long, std::allocator<char> const&) (stl_vector.h:136)
|         ->99.73% (30,000,000B) 0x407636: std::vector<char, std::allocator<char> >::vector(unsigned long, std::allocator<char> const&) (stl_vector.h:278)
|           ->99.73% (30,000,000B) 0x4075C5: SomeClass::SomeClass() (SomeClass.cpp:4)
|  A ====>   ->33.24% (10,000,000B) 0x405F91: main (main.cpp:20)
|             | 
|  B ====>    ->33.24% (10,000,000B) 0x405EC1: h() (main.cpp:10)
|             | ->33.24% (10,000,000B) 0x405EEF: g() (main.cpp:12)
|             |   ->33.24% (10,000,000B) 0x405EFB: f() (main.cpp:13)
|             |     ->33.24% (10,000,000B) 0x405F07: MakeSomeClass() (main.cpp:14)
|             |       ->33.24% (10,000,000B) 0x405F9A: main (main.cpp:21)
|             |         
|  C ====>    ->33.24% (10,000,000B) 0x4063F2: _ZN5boost11make_sharedI9SomeClassIEEENS_6detail15sp_if_not_arrayIT_E4typeEDpOT0_ (make_shared_object.hpp:254)
|               ->33.24% (10,000,000B) 0x405FA6: main (main.cpp:23)
|                 
->00.24% (72,844B) in 1+ places, all below ms_print's threshold (01.00%)

We quickly see that line 20 of the main uses one third of the memory (A) where we wrote a new. The next 30% of the memory (B) is allocated in h() – Massif recorded all the call stack at the point of allocation. We can trace it down to the call to MakeSomeClass() in the main. Massif also works with shared pointers (C).

We can’t see the allocation at line 24 because it has not yet been executed and “intercepted” by Massif. We may spot it in a later snapshot. The remaining allocations are “small” and summarized in the last line.

A quick glance at the report tells us to go check the constructor of SomeClass. What the heck is it doing with a std::vector that takes 99% of the memory?

This is already a good result, obtained with little effort. Be aware that Massif can do more. It can measure the memory used “behind the scenes” by the system to make the heap work (extra-heap – 7,212 bytes in the example), track the stack…

The do-it-yourself way: override operator new

C++ allows to replace the operator to create objects (new) with a custom one.*

Almost nobody has a good reason to do so, but we do: I could not figure out how to use the profiler intercept heap allocations.

By and large, all we have to do is define a custom new (and its overloads) in any file of a program.

If the memory profiler is an equivalent of the “time” profiler, then you can compare this trick to the classic snippet cout << endTime - startTime;. Not really detailed or accurate, but simple and useful.

A few lines of code can give us something raw, but usable. You should compile with debug symbols. The code that outputs the stack trace can probably work only on Linux.*.


There is nothing portable when you work at low level.

If you are in the Microsoft world: https://msdn.microsoft.com/en-us/library/windows/desktop/bb204633%28v=vs.85%29.aspx.

That means:


#include <iostream>
//
#include <Windows.h> // Capture stack traces.
#include <Dbghelp.h> // Read debug symbols.

//
void StackTrace() {
/* Capture the stack trace. */
const ULONG doNotSkipAnyFrame = 0;
const ULONG takeTenFrames = 10;
const PULONG doNotHash = nullptr;
PVOID stackTrace[takeTenFrames];
const USHORT framesCaptured = CaptureStackBackTrace(
doNotSkipAnyFrame,
takeTenFrames,
stackTrace,
doNotHash
);
//
/*Prepare the symbol table to convert from addresses to lines of code. */
const HANDLE thisProcess = GetCurrentProcess();
SymInitialize(thisProcess, NULL, TRUE); // Linkare Dbghelp.lib
//
for (ULONG i = 0; i < framesCaptured; i++) {
/*Estrae il nome della funzione. */
const size_t nameStringSize = 256;
SYMBOL_INFO * functionData = (SYMBOL_INFO*)malloc(sizeof(SYMBOL_INFO) + (nameStringSize + 1) * sizeof(char)); // +1 because there is \0
functionData->MaxNameLen = nameStringSize;
functionData->SizeOfStruct = sizeof(SYMBOL_INFO);
SymFromAddr(thisProcess, (DWORD64)(stackTrace[i]), 0, functionData);
//
/* Find the file matching the function call.*/
DWORD displacementInLine;
IMAGEHLP_LINE64 lineOfCode;
lineOfCode.SizeOfStruct = sizeof(IMAGEHLP_LINE64);
SymGetLineFromAddr64(thisProcess, (DWORD)(stackTrace[i]), &displacementInLine, &lineOfCode);
//
std::cout << functionData->Name << " at "
<< lineOfCode.FileName << ":" << lineOfCode.LineNumber << std::endl;
}
}

.


// Our special new must allocate memory as expected…
#include <cstdio>
#include <cstdlib>
// …but also inspect the stack and print some results.
#include <execinfo.h>
#include <unistd.h>
#include <fstream>
// Import bad_alloc, expected in case of errors.
#include <new>
//
/* Opens (once) and return the file to save the results.. */
static std::ofstream& resultFile() {
static std::ofstream memoryProfile;
static bool open = false; // Init on 1st use, as usual.
if (! open) {
memoryProfile.open ("allocations.txt");
open = true;
}
// Else, handle errors, close the file…
// We won’t do it, to keep the example simple.
return memoryProfile;
}
//
/* This is the "magic" function that inspect the stack and writes it in a file. */
static void dumpStackTrace(std::ofstream& memoryProfile) {
// Record 15 pointers to stack frame - enough for the example program.
const int maximumStackSize = 15;
void *callStack[maximumStackSize];
size_t framesInUse = backtrace(callStack, maximumStackSize);
// Now callStack is full of pointers. Request the names of the functions matching each frame.
char ** mangledFunctionNames = backtrace_symbols(callStack, framesInUse);
// Writes all the function names in the stream.
for (size_t i = 0; i < framesInUse; ++i)
memoryProfile << mangledFunctionNames[i] << std::endl;
// To be fair, we should release mangledFunctionNames with free…
}
//
/* Now we have all the elements to build the custom operator new. */
void* operator new(std::size_t sz) {
// Allocate the requested memory for the caller.
void * requestedMemory = std::malloc(sz);
if (! requestedMemory)
throw std::bad_alloc();
// Share our allocations with the world.
std::ofstream& memoryProfile = resultFile();
memoryProfile << "Allocation, size = " << sz << " at " << static_cast<void*>(requestedMemory) << std::endl;
dumpStackTrace(memoryProfile);
memoryProfile << "-----------" << std::endl; // Poor man’s separator.

return requestedMemory;
}

Let’s add the “tricked out” operator new to our test program. This is an example of the result – can you guess the line of code behind it?

Allocation, size = 40 at 0x18705b0
./overridenew(_Z14dumpStackTraceRSt14basic_ofstreamIcSt11char_traitsIcEE+0x3c) [0x40672c]
./overridenew(_Znwm+0xaf) [0x406879]
./overridenew(_ZN9__gnu_cxx13new_allocatorISt23_Sp_counted_ptr_inplaceI9SomeClassSaIS2_ELNS_12_Lock_policyE2EEE8allocateEmPKv+0x4a) [0x405d9e]
./overridenew(_ZNSt16allocator_traitsISaISt23_Sp_counted_ptr_inplaceI9SomeClassSaIS1_ELN9__gnu_cxx12_Lock_policyE2EEEE8allocateERS6_m+0x28) [0x405bef]
./overridenew(_ZSt18__allocate_guardedISaISt23_Sp_counted_ptr_inplaceI9SomeClassSaIS1_ELN9__gnu_cxx12_Lock_policyE2EEEESt15__allocated_ptrIT_ERS8_+0x21) [0x4059e2]
./overridenew(_ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EEC2I9SomeClassSaIS4_EJEEESt19_Sp_make_shared_tagPT_RKT0_DpOT1_+0x59) [0x4057e1]
./overridenew(_ZNSt12__shared_ptrI9SomeClassLN9__gnu_cxx12_Lock_policyE2EEC2ISaIS0_EJEEESt19_Sp_make_shared_tagRKT_DpOT0_+0x3c) [0x4056ae]
./overridenew(_ZNSt10shared_ptrI9SomeClassEC2ISaIS0_EJEEESt19_Sp_make_shared_tagRKT_DpOT0_+0x28) [0x40560e]
./overridenew(_ZSt15allocate_sharedI9SomeClassSaIS0_EIEESt10shared_ptrIT_ERKT0_DpOT1_+0x37) [0x405534]
./overridenew(_ZSt11make_sharedI9SomeClassJEESt10shared_ptrIT_EDpOT0_+0x3b) [0x405454]
./overridenew(main+0x9c) [0x4052e8]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f83fe991830]
./overridenew(_start+0x29) [0x405079]
-----------
Allocation, size = 10000000 at 0x7f83fc9c3010
./overridenew(_Z14dumpStackTraceRSt14basic_ofstreamIcSt11char_traitsIcEE+0x3c) [0x40672c]
./overridenew(_Znwm+0xaf) [0x406879]
./overridenew(_ZN9__gnu_cxx13new_allocatorIcE8allocateEmPKv+0x3c) [0x406538]
./overridenew(_ZNSt16allocator_traitsISaIcEE8allocateERS0_m+0x28) [0x4064aa]
./overridenew(_ZNSt12_Vector_baseIcSaIcEE11_M_allocateEm+0x2a) [0x406450]
./overridenew(_ZNSt12_Vector_baseIcSaIcEE17_M_create_storageEm+0x23) [0x4063cb]
./overridenew(_ZNSt12_Vector_baseIcSaIcEEC1EmRKS0_+0x3b) [0x4062f7]
./overridenew(_ZNSt6vectorIcSaIcEEC2EmRKS0_+0x2c) [0x406286]
./overridenew(_ZN9SomeClassC1Ev+0x3d) [0x406215]
./overridenew(_ZN9__gnu_cxx13new_allocatorI9SomeClassE9constructIS1_JEEEvPT_DpOT0_+0x36) [0x405e3a]
./overridenew(_ZNSt16allocator_traitsISaI9SomeClassEE9constructIS0_JEEEvRS1_PT_DpOT0_+0x23) [0x405d51]
./overridenew(_ZNSt23_Sp_counted_ptr_inplaceI9SomeClassSaIS0_ELN9__gnu_cxx12_Lock_policyE2EEC2IJEEES1_DpOT_+0x8c) [0x405b4a]
./overridenew(_ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EEC2I9SomeClassSaIS4_EJEEESt19_Sp_make_shared_tagPT_RKT0_DpOT1_+0xaf) [0x405837]
./overridenew(_ZNSt12__shared_ptrI9SomeClassLN9__gnu_cxx12_Lock_policyE2EEC2ISaIS0_EJEEESt19_Sp_make_shared_tagRKT_DpOT0_+0x3c) [0x4056ae]
./overridenew(_ZNSt10shared_ptrI9SomeClassEC2ISaIS0_EJEEESt19_Sp_make_shared_tagRKT_DpOT0_+0x28) [0x40560e]

...

…I can’t. Where is “main+0xa8” in my code? Thankfully in the “gnu/Linux world” there are tools to de-mangle names and find the point in the code that corresponds to a given address. We can use them, for example, in a simple script.


#!/usr/bin/python
#
# C++filt demangles names.
#
# addr2line converts code pointers (e. g. functions’ addresses)
# into the file:line couple corresponding to the code (if there are debug symbols).
#
# The python code should be portable, but the called utilities aren’t.
#

import re
import subprocess
#

# Opens a sub-process and passes shell commands to it. Returns the results as a string.
# Not very efficient, but easy.
def run_shell(command):
return subprocess.Popen(command, stdout=subprocess.PIPE).communicate()[0]
#
#
if __name__ == "__main__":
total_size = 0;
#
# There are 2 types of lines in the output: stack frames and allocation sizes.
size_line = re.compile("Allocation, size = (\d+) at (\d+)") # Allocation, size = <bytes> at <pointer somewhere in the heap>
stack_line = re.compile(".*\((.*)\+.*\) \[(.*)\]") # <rubbish>(mangled name) [<code pointer>]
#
allocations_file = open("allocations.txt")
for line in allocations_file:
match_size = size_line.match(line)
match_stack = stack_line.match(line)
#
# For a demo, I compute the sum of all the used memory.
# The things you can do with an overridden new!
if (match_size):
allocation_size = int(match_size.group(1))
total_size += allocation_size
print "Used " + str(allocation_size)
#
elif (match_stack):
mangled_name = match_stack.group(1)
line_address = match_stack.group(2)
demangled_name = run_shell(["c++filt", "-n", mangled_name])
line_number = run_shell(["addr2line", "-e", "./overridenew", line_address])
#
# This is not professional-grade formatting. The -1 cuts away the newlines.
print"\t" + demangled_name[:-1] + "\n\t\t" + line_number,
#
# Copy the separator as they were.
else:
print line
#
print "\n total allocated size " + str(total_size)

As an alternative, we could to everything at run time, using the compiler’s demangling utilities, such as the gcc one. Personally I prefer to keep the code instrumentation as simple as possible and do the “heavy lifting” off-line. My script returns:

Used 40
    dumpStackTrace(std::basic_ofstream<char, std::char_traits<char> >&)
        /home/stefano/projects/code/spy-memory-with-new/InstrumentedNew.cpp:29
    operator new(unsigned long)
        /home/stefano/projects/code/spy-memory-with-new/InstrumentedNew.cpp:48
    __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<SomeClass, std::allocator<SomeClass>, (__gnu_cxx::_Lock_policy)2> >::allocate(unsigned long, void const*)
        /usr/include/c++/5/ext/new_allocator.h:105
    
    ... internal calls of the shared pointer...
    
    std::shared_ptr<SomeClass> std::allocate_shared<SomeClass, std::allocator<SomeClass>>(std::allocator<SomeClass> const&)
        /usr/include/c++/5/bits/shared_ptr.h:620
    _ZSt11make_sharedI9SomeClassIEESt10shared_ptrIT_EDpOT0_
        /usr/include/c++/5/bits/shared_ptr.h:636
    main
        /home/stefano/projects/code/spy-memory-with-new/main.cpp:25
    __libc_start_main
        ??:0
    _start
        ??:?
-----------

Used 10000000
    dumpStackTrace(std::basic_ofstream<char, std::char_traits<char> >&)
        /home/stefano/projects/code/spy-memory-with-new/InstrumentedNew.cpp:29
    operator new(unsigned long)
        /home/stefano/projects/code/spy-memory-with-new/InstrumentedNew.cpp:48
    __gnu_cxx::new_allocator<char>::allocate(unsigned long, void const*)
        /usr/include/c++/5/ext/new_allocator.h:105
    
    ...internal calls of vector...
    
    std::vector<char, std::allocator<char> >::vector(unsigned long, std::allocator<char> const&)
        /usr/include/c++/5/bits/stl_vector.h:279
    SomeClass::SomeClass()
        /home/stefano/projects/code/spy-memory-with-new/SomeClass.cpp:4 (discriminator 2)
    ...

The first allocation are the 40 bytes requested by make_shared. 24 for SomeClass (its only member is a vector – sizeof(vector) is 24), the rest should be the control block of the shared pointer. The second allocation are the 10MB in the notorious constructor of SomeClass.

It takes some effort to navigate the stacks, but it is possible to understand that the mistery line was std::shared_ptr stdSmartPointer = std::make_shared<SomeClass>(); – close to the return at main.cpp:25.

Homework: how many allocations would there be with std::shared_ptr<SomeClass> notSoSmartPointer(new SomeClass());
?
*

Three, and using 8 more bytes.
In a test I found:
24 bytes for SomeClass’s instance
10 MB to fill the vector
24 bytes for the shared pointer.

Looking at the implementation notes, I believe that the difference is in the content of the shared pointer’s control block.


In the end…

Programmers have been fighting against memory since the dawn of time, because it is slow and too small. As for every bottleneck, one can’t trust his instincts. We saw that there are proper tools (memory profilers) to measure the memory usage. We discovered that, in a pinch, there are “home made” tools we can build ourselves with a “stereotypical C++ hack”, the override of operator new.

You can find the “ready-to-compile” code in the ++It GitHub repo.

]]> 7387 Coroutines Internals https://www.italiancpp.org/2016/11/02/coroutines-internals/ Wed, 02 Nov 2016 09:15:30 +0000 http://www.italiancpp.org/?p=6862 Article's code is available on Github

What are coroutines and why should I care?

In The Art of Computer Programming Donald Knuth introduced coroutines as an alternative to the usual function caller/callee idiom where two pieces of code were treated as cooperating equals.
Coroutines can be thought of as language-level constructs that generalize subroutines by providing multiple exit/entry points. A normal subroutine usually has a starting point and one or more exit (return) points. A coroutine provides the ability to enter/exit its control flow at different spots therefore allowing for greater code expressiveness, preservation of automatic states across function calls and nonpreemptive multitasking.

It has to be noted that different programming languages can provide various levels of support for coroutines, e.g.

  • Languages supporting the yield keyword
  • Languages providing full support for async, await, yield

In this article we’ll focus on the former.

Thoughtful use of coroutines can lead to cleaner and more maintainable code in a variety of situations. As a motivating example let’s take for instance the following pseudocode

function start_accepting() {

  socket.async_accept(accept_handler);

  start_accepting();

}

function accept_handler() {

  socket.async_read(read_handler);
 
}

function read_handler(data) {

  request = parse_data(data);

  switch(request) {

    case SEND_DATA: {

      data_to_send = prepare_data_for(request);

      socket.async_write(data_to_send, write_handler);

    } break;

  };

}

function write_handler() {

  ... // continue execution

}

Asynchronous programming is often the preferred way of accomplishing potentially blocking operations without stalling the thread on blocking calls. In the pseudocode above we’re assuming (and omitting for clarity’s sake) that all operations are queued and handled by an event loop manager (a common and powerful idiom in asynchronous applications programming, cfr. boost::asio).

Coroutines allow modeling the same behavior with more readable code

coroutine acceptor() {

  while(true) {

    socket.async_accept_connection(yield); // To event manager

    start_coroutine(serve_request);

  }

}

coroutine serve_request() {

  socket.async_read(data, yield);

  request = parse_data(data);

  switch(request) {

    case SEND_DATA: {

      data_to_send = prepare_data_for(request);

      socket.async_write(data_to_send, yield);

      ... // Continue execution

    } break;

  };

}

 

The code in serve_request() uses a sequential-looking paradigm

Coroutines let you create a structure that mirrors the actual program logic. Asynchronous operations don’t split functions, because there are no handlers to define what should happen when an asynchronous operation completes. Instead of having handlers call each other, the program can use a sequential structure.

(boost.asio-coroutines)

 

Standard support

At the time of writing this article coroutines didn’t make it for the next standard version (C++17) although recent MSVC versions already ship with an /await option to test experimental support for coroutines.

Coroutines internals

It is important to understand the role of coroutines in providing a collaborative non-preemptive multitasking: spawning a coroutine does not spawn a new thread of execution but coroutines waive execution by yielding to callers (asymmetric coroutines) or to other coroutines (symmetric coroutines) explicitly.

Since coroutines are concepts that have been known for a relatively long time many different techniques (both at language level and system level) have been devised (an interesting suggested reading: Duff’s device based coroutines).

Implementing basic support for asymmetric stackful coroutines can be a rewarding experience in terms of understanding the relationship between coroutines and the way these program flow constructs interact with callers and the underlying memory. Most of the code that will be presented is a pruned-down version of the coroutine implementation by Oliver Kowalke (cfr. boost::coroutine2) available with the boost libraries.

Abstracting away the execution context

In order to implement a coroutine context switch (in its simplest form from the callee to the caller) we need a mechanism to abstract the execution state of our routines and to save/restore stack, registers, CPU flags, etc.

A context switch between threads on x86_64 can be quite costly in terms of performances since it also involves a trip to kernel syscalls. Coroutines context switches in the same thread are far more lightweight and require no kernel interaction. The foundation bricks for userland context switches are contained in the boost/context library. These functionalities provide a solid abstraction that can provide ready-to-use stacks (either basic malloc’d stack buffers or even split stacks on supported architectures) to store our context data or complement system-level constructs (e.g. ucontext on Unix systems, Fibers on Windows). It has to be noted that boost::context used to support Windows fibers due to undocumented TEB-swapping related issues; after fixes were deployed support was dropped since the introduction of execution context v2.

In this article we’ll go for a fcontext implementation in assembler on a x86_64 Unix system (no system calls involved).

Saving the state

When a coroutine yields an fcontext switch should occur, i.e. we should save whatever state the routine was at that point in time and JMP to another context. On a recent Unix system calling conventions, object and executable file formats and other low-level ABI issues are defined by the System V ABI. On a x86_64 architecture the stack grows downwards and parameters to functions are passed in registers rdi, rsi, rcx, r8, r9 + additional stack space if needed. The stack is always 16-byte aligned before a call instruction is issued. Registers rbx, rsp, rbp, r12, r13, r14, and r15 are preserved across function calls while rax, rdi, rsi, rdx, rcx, r8, r9, r10, r11 are scratch registers:

Return value Parameter Registers Scratch Registers Preserved Registers
rax, rdx rdi, rsi, rdx, rcx, r8, r9 + additional_stack rax, rdi, rsi, rdx, rcx, r8, r9, r10, r11 rbx, rsp, rbp, r12, r13, r14, r15

Therefore following in boost::context‘s footsteps a reasonable memory layout is the following

/****************************************************************************************
 *                                                                                      *
 *  ----------------------------------------------------------------------------------  *
 *  |    0    |    1    |    2    |    3    |    4     |    5    |    6    |    7    |  *
 *  ----------------------------------------------------------------------------------  *
 *  |   0x0   |   0x4   |   0x8   |   0xc   |   0x10   |   0x14  |   0x18  |   0x1c  |  *
 *  ----------------------------------------------------------------------------------  *
 *  |        R12        |         R13       |         R14        |        R15        |  *
 *  ----------------------------------------------------------------------------------  *
 *  ----------------------------------------------------------------------------------  *
 *  |    8    |    9    |   10    |   11    |    12    |    13   |    14   |    15   |  *
 *  ----------------------------------------------------------------------------------  *
 *  |   0x20  |   0x24  |   0x28  |  0x2c   |   0x30   |   0x34  |   0x38  |   0x3c  |  *
 *  ----------------------------------------------------------------------------------  *
 *  |        RBX        |         RBP       |         RIP        |       EXIT        |  *
 *  ----------------------------------------------------------------------------------  *
 *                                                                                      *
 ****************************************************************************************/

The EXIT field is going to be left unused for our purposes but it will be left in place anyway.

The first thing we need is to allocate space to store the context data and make sure it has a valid alignment for the architecture we’re dealing with

// Allocate context-stack space
context_stack = (void*)malloc(64_Kb);

std::size_t space = UNIX_CONTEXT_DATA_SIZE + 64;
sp = static_cast<char*>(context_stack) + 64_Kb - space;

sp = std::align(64, UNIX_CONTEXT_DATA_SIZE, sp, space);
assert(sp != nullptr && space >= UNIX_CONTEXT_DATA_SIZE);

boost::context offers both memory-efficient on-demand growing stacks or fixed stack allocations (cfr. boost docs). In this example code we’ll go for a fixed stack allocation.
Since we can’t deal with registers directly in C++ we’ll have to fallback on a pure assembly routine. The GAS backend seems the logical tool of choice for this work. We therefore define an external function to link against our executable with C linkage:

extern "C" fcontext_t jump_to_context(fcontext_t context);

What is an fcontext_t? In a x86_64 world it is just a register’s content:

using fcontext_t = void*;

Luckily for us RIP will have already been set by the fact we’re invoking jump_to_context with a CALL instruction so we get an instruction pointer on the stack for free in our assembly code:

.text
.globl jump_to_context
.type jump_to_context,@function
.align 16
jump_to_context:
    pushq  %rbp  /* save RBP */
    pushq  %rbx  /* save RBX */
    pushq  %r15  /* save R15 */
    pushq  %r14  /* save R14 */
    pushq  %r13  /* save R13 */
    pushq  %r12  /* save R12 */

    /* omissis */

    /* restore RSP (pointing to context-data) from RDI */
    movq  %rdi, %rsp

    popq  %r12  /* restore R12 */
    popq  %r13  /* restore R13 */
    popq  %r14  /* restore R14 */
    popq  %r15  /* restore R15 */
    popq  %rbx  /* restore RBX */
    popq  %rbp  /* restore RBP */

    /* continue... */

.size jump_to_context,.-jump_to_context

/* Mark that we don't need executable stack. */
.section .note.GNU-stack,"",%progbits

Using CMake putting everything together becomes quite easy:

project(simple_crts CXX ASM)
cmake_minimum_required(VERSION 2.8.12)
set (CMAKE_CXX_STANDARD 14)

set_source_files_properties(jump_to_context_x86_64_elf_gas.S 
                            PROPERTIES COMPILE_FLAGS "-x assembler-with-cpp")

add_executable(simple_crts simple_crts.cpp jump_to_context_x86_64_elf_gas.S)
target_link_libraries(simple_crts ${CONAN_LIBS} pthread)

Trampolines to coroutines

Something is missing at this point: we need a valid RIP pointer to the coroutine to jump to. We could enter the coroutine and have another function store this information for us, but there’s a better way which avoids cluttering the coroutine code entirely: using a trampoline function.

Just as in boost::context, we define a trampoline function ourselves which, when jumped to, re-jumps to the caller and saves its context as a pre-stage for the coroutine:

void trampoline(fcontext_t ctx) {

  yield_ctx = jump_to_context(ctx);

  wannabe_coroutine();
}

What we have to do now is a simplified version of the make_context routine to set the first RIP towards the trampoline’s prologue:

// Do make_context's work (simplified)
// Do *NOT* try this at home (or even worse in the office)
void** addr = reinterpret_cast<void**>(static_cast<char*>(sp) +
                                         UNIX_CONTEXT_DATA_RIP_OFFSET);
*addr = reinterpret_cast<void*>(&trampoline);

// In a more complex case there might be additional initialization and
// frame adjustments going on
coroutine_ctx = jump_to_context(sp);

So right now we have a valid trampoline RIP set in place:

/****************************************************************************************
 *                                                                                      *
 *  ----------------------------------------------------------------------------------  *
 *  |    0    |    1    |    2    |    3    |    4     |    5    |    6    |    7    |  *
 *  ----------------------------------------------------------------------------------  *
 *  |   0x0   |   0x4   |   0x8   |   0xc   |   0x10   |   0x14  |   0x18  |   0x1c  |  *
 *  ----------------------------------------------------------------------------------  *
 *  |        R12        |         R13       |         R14        |        R15        |  *
 *  ----------------------------------------------------------------------------------  *
 *  ----------------------------------------------------------------------------------  *
 *  |    8    |    9    |   10    |   11    |    12    |    13   |    14   |    15   |  *
 *  ----------------------------------------------------------------------------------  *
 *  |   0x20  |   0x24  |   0x28  |  0x2c   |   0x30   |   0x34  |   0x38  |   0x3c  |  *
 *  ----------------------------------------------------------------------------------  *
 *  |        RBX        |         RBP       |         RIP        |       EXIT        |  *
 *  ----------------------------------------------------------------------------------  *
 *                                           ^^^^^^^^^^^^^^^^^^^^                       *
 ****************************************************************************************/

This kickstarts the bouncing to/from the trampoline:

.text
.globl jump_to_context
.type jump_to_context,@function
.align 16
jump_to_context:
    pushq  %rbp
    pushq  %rbx 
    pushq  %r15 
    pushq  %r14 
    pushq  %r13 
    pushq  %r12 

    /* store RSP (pointing to context-data) in RAX */
    movq  %rsp, %rax

    movq  %rdi, %rsp

    popq  %r12 
    popq  %r13 
    popq  %r14 
    popq  %r15 
    popq  %rbx 
    popq  %rbp 

    /* restore return-address (must have been put on the new stack) */
    popq  %r8

    /*
       pass the old context as first parameter (if we're headed
       towards a landing function)
    */
    movq  %rax, %rdi

    /* indirect jump to new context */
    jmp  *%r8

.size jump_to_context,.-jump_to_context
.section .note.GNU-stack,"",%progbits

It is important to note that we’re keeping the stack aligned during this entire process (recall that the stack has to be 16-bytes aligned before a call instruction is issued).

The process roughly goes on like this:
coroutines_graph1

It has to be noted that the trampoline function might reserve stack space for its parameters as well. In the code above we allocated 64Kb of heap space to be used as stack space for context operations. So after the first jump the sp automatic variable is no longer reliable. coroutine_ctx should be used instead.

Resuming fcontext

Resuming trampoline’s fcontext requires another call and rip-save and stack pointer adjustment to coroutine_ctx. Trampoline’s old RIP will be available for free after we’ve restored the first 48 bytes of the fcontext.

Execution can afterwards continue to the designated coroutine. At this point the coroutine should be somehow encapsulated to be able to use the yield_ctx context pointer: that is the gateway to our (in an asymmetric view) caller context.

Each time we want to yield execution back to the caller we’ll have to jump_to_context to the yield_ctx:

void yield() {
  yield_ctx = jump_to_context(yield_ctx);
}

void wannabe_coroutine() {
  std::cout << "I wanna be a coroutine when I grow my stack space up\n";
  yield();
  std::cout << "Hooray!\n";
  yield();
}

Notice that we’re also reassigning the variable with the return value provided by jump_to_context. This assignment is not executed until the control flow comes back to the yield() function:

.. save

/* store RSP (pointing to context-data) in RAX */
movq  %rsp, %rax

.. restore

This is a cooperative behavior example: each jump_to_context() invocation from this point onward actually returns fcontext data for the previous invocation.

The rest of the code bounces back and forth through the contexts resulting in the following sequence:

coroutines_graph2

At the end of the day the stack is freed (sounds weird to say) and the program terminates.

Exercise: wrapping up

As a didactic exercise (i.e. do not EVER use this code in a production environment) we can use some metaprogramming to wrap our coroutines and avoid polluting our code with stack adjustments and cleanup boilerplate. Eventually we’d like to end up with code like this

int g_fib = -1;

void fibonacci_gen() {

  int first = 1;
  g_fib = first;
  yield();

  int second = 1;
  g_fib = second;
  yield();

  for (int i = 0; i < 9; ++i) {

    int third = first + second;
    first = second;
    second = third;
    g_fib = third;
    yield();

  }

}

int main() {

  // Outputs the first 10 Fibonacci numbers

  coroutine<void(void)> coro(fibonacci_gen);

  for (int i = 0; i <= 10; ++i) {
      coro.resume();

    if(i) std::cout << g_fib << " ";
  }

}

To do this we create a templated wrapper class:

template class coroutine;

that will handle the stack and trampoline setup for us. One difference from the first example is the need for a wrapper that will handle the trampoline invocation (shields us from implementation-dependent issues):

template 
void call_member_trampoline(coroutine *instance, fcontext_t ctx) {
  instance->trampoline(ctx);
}

The trampoline is therefore modified as follows:

void trampoline(fcontext_t ctx) {

  size_t index = yield_ctx.size() - 1;
  yield_ctx[index] = jump_to_context(this, ctx);

  this->m_fn();
}

The only difference in the jump_to_context() function is in handling its new arity:

/* restore RSP (pointing to context-data) from RSI */
movq  %rsi, %rsp

and the promotion of %rdi from scratch-register to parameter-register (since we’re directly jumping to a destination context’s RIP).

The rest of the code remains largely unchanged.

Back to boost::context

If you’ve followed through the entire article and you made it here, you should by now know what the following boost::context2 program does:

#include <boost/context/all.hpp>
#include <iostream>
#include <array>

namespace ctx = boost::context::detail;

class Coroutine {
public:
  Coroutine() {
    my_context = ctx::make_fcontext(
      stack.data() + stack.size(),
      stack.size(),
      &Coroutine::dispatch
    );
  }
  virtual ~Coroutine() {}

  void operator()() {
    auto transfer_ctx = ctx::jump_fcontext(my_context, this);
    my_context = transfer_ctx.fctx;
  }

protected:
  void yield() {
    auto transfer_ctx = ctx::jump_fcontext(yield_context, 0);
    my_context = transfer_ctx.fctx;
  }

  virtual void call() = 0;

private:
  static void dispatch(ctx::transfer_t coroutine_ptr) {
    Coroutine *coroutine = reinterpret_cast<Coroutine *>(coroutine_ptr.data);
    coroutine->yield_context = coroutine_ptr.fctx;
    coroutine->call();
    while(true)
      coroutine->yield();
  }

private:
  ctx::fcontext_t my_context;
  ctx::fcontext_t yield_context;
  std::array<intptr_t, 66 * 1024> stack;
};

struct A : public Coroutine {
  void call() {
    std::cout << " __________________________________ " << std::endl;
    yield();
    std::cout << "|    _       _       |_|    _| |_  |" << std::endl;
    yield();
    std::cout << "|   |_|     |_|      | |     | |_  |" << std::endl;
  }
};

struct B : public Coroutine {
  void call() {
    std::cout << "|                                  |" << std::endl;
    yield();
    std::cout << "|  _| |_   _| |_      _    |_   _| |" << std::endl;
    yield();
    std::cout << "|                    |_|     |___| |" << std::endl;
    yield();
    std::cout << "|                                  |" << std::endl;
    yield();
    std::cout << "|__________________________________|" << std::endl;
  }
};

struct C : public Coroutine {
  void call() {
    std::cout << "|                     _       _    |" << std::endl;
    yield();
    std::cout << "| |_   _| |_   _|    | |     | |   |" << std::endl;
  }

  void operator++(int) {
    std::cout << "| ++It - The Italian C++ Community |" << std::endl;
    std::cout << "|__________________________________|" << std::endl;
  }
};


int main() {

  A a;
  B b;
  C c;
  for (size_t i = 0; i<10; ++i) {
    a();
    b();
    c();
  }

  c++; // An entire operator overloading to write 'c++'? Worth it!
}

Final words

Coroutines provide a powerful abstraction to offer the same level of concurrency one would get with asynchronous callbacks by offering at the same time a chance to write more maintainable code. At the time of writing this article boost offers context, coroutine and fiber libraries. As we’ve seen boost::context provides the foundation for userland context switches, boost::coroutine2 offers coroutines support (which conceptually have no need of synchronization whatsoever since they implement a nonpreemptive cooperative multitasking) and boost::fiber which builds on boost::context to add a scheduling mechanism: each time a fiber yields, control is given back to a scheduler manager which decides the next execution strategy (cfr. N4024).

As usual it is up to the programmer to carefully choose which abstractions are to be used in a specific context.

References and credits

Special thanks to the ++it community and Oliver Kowalke for providing insights and reviews of parts of this article.

]]> 6862 Unicode, localization and C++ support https://www.italiancpp.org/2016/04/20/unicode-localization-and-cpp-support/ https://www.italiancpp.org/2016/04/20/unicode-localization-and-cpp-support/#comments Wed, 20 Apr 2016 10:06:20 +0000 http://www.italiancpp.org/?p=5941 This document doesn’t attempt to be yet another Unicode article but rather target the fundamental points that should be of interest to a C++ programmer diving into the new standards and into the Unicode subject either for the first time or as a refresher. It isn’t by any means a complete source of information on Unicode (there’s no mention of versioning, BOM sequences or other advanced topics like cluster composition algorithms or language-specific drawbacks addressing techniques) but only meant to provide insights on features that might be relevant for a programmer.

What is Unicode

Unicode is an industry standard to encode text independently from language, platform or program. The idea is to assign a unique number (called code point) to each character used in writing. E.g. the codepoint for the latin capital letter A is 0041 and to indicate that this is a unicode codepoint a U+ prefix is commonly added: U+0041. One might notice that the same number is also the hex for the character A in the ASCII table. This is by design since the first 256 code points were made identical to the content of the ISO-8859-1 ASCII-based encoding standard in order to make it easy to port existing western texts to unicode. Attention has to be paid to an often misunderstood aspect: unicode was originally designed to use 2 bytes for its codepoints, but this is no longer the case (e.g. emoji characters have been added to the unicode pages and mapped from codepoint U+1F600 onward, which cannot be represented with just 2 bytes).

The getaway from this paragraph is that unicode is a standard way to assign a unique number called code point to each writing character.

Implementing unicode

Since unicode maps each character to a (potentially very large) number, different encodings were developed to map this number (codepoint) to a sequence that could efficiently be transmitted or stored. Business interests and technical arguments caused standardization issues and therefore many different ways were developed.

UCS

UCS stands for Universal Coded Character Set and it was originally a standard with goals similar to those of the unicode consortium. Nowadays efforts have been made to (more or less) synchronize Unicode and UCS. Some of the original encodings are still used though, for instance what is now called UCS-2 consists in encoding a codepoint in two bytes. UCS-2 is now deprecated since, again, 2 bytes cannot suffice to represent every codepoint in Unicode. Another UCS encoding is UCS-4 (now also called UTF-32) which uses 32 bits per codepoint (and this suffices to represent Unicode codepoints).

UTF

UTF stands for Unicode Transformation Format. Instead of a fixed-length encoding that uses the same amount of bytes to encode a codepoint as UCS-2 or UCS-4 do, many UTF encodings prefer to use a variable-width encoding. UTF-8 is one of the most used and famous variable-length encoding. It uses 8-bit code-units (the basic unit of an encoded sequence, in UTF-8 corresponds to one byte which encodes both the character codepoint and some other encoding-specific data) so for example the “Hello” string in UTF-8 would be stored as:

 Hello
 0x48 0x65 0x6C 0x6C 0x6F

which again, maps the contents of directly translating the string into the ASCII-table sequence (5 bytes). Characters whose codepoints are below U+0080 are represented with a single byte. These are also the first 128 ASCII table characters. If the codepoint is higher, as in the case for the € euro sign U+20AC, the most significant bit is set to 1.

 €
 0xE2        0x82        0xAC
 1110 0010   10 000010   10 101100
 ^^^^        ^^          ^^
 bits not part of the character codepoint but only related to the encoding

In this case three bytes are necessary to represent in UTF-8 the € character. The number of bytes needed is described in the specification and decoding it is a straightforward procedure.

UTF-8 therefore takes a variable amount of bytes (1 up to 4 by design) to encode a codepoint. UTF-16 is also variable-length and uses one or two 16-bit code units and it was originally developed as a successor for the now obsolete UCS-2 (codepoints in the so-called BMP plane can be represented by UCS-2, other codepoints could not and as a workaround had to be encoded differently and are commonly referred to as surrogate pairs). UTF-32 uses 32 bits per code unit but since 4 bytes is also the defined maximum to be used to encode a codepoint, this effectively makes UTF-32 a fixed-length encoding. Its usage is discouraged by W3C.

C++11 and Unicode support

Since the C++11 standard some additional Unicode facilities have been integrated into the language. C++ fundamental wchar_t type is dedicated to storing any supported code unit (usually 32 bits on systems that support Unicode with the exception of Windows using 16 bytes). C++11 introduced char16_t and char32_t, types large enough to store UTF-16 code units (2 bytes each) and UTF-32 code units (4 bytes each). The char type remains dedicated to whatever representation can be most efficiently processed on the system and, on a machine where char is 8 bits, it is used for 8-bit code units. One should not assume that plain ASCII-table characters are encoded into a char sequence but rather treat it as containing 8-bit codeunits.

The header <string> provides useful typedefs to work with specializations of the base string class template:

std::string std::basic_string<char>
std::wstring std::basic_string<wchar_t>
std::u16string (C++11) std::basic_string<char16_t>
std::u32string (C++11) std::basic_string<char32_t>

Converting between byte string std::string and wide string std::basic_string<wchar_t> (std::wstring) are also supported natively starting with C++11. An example follows:

#include <iostream>
#include <codecvt>
#include <string>
#include <locale>
#include <iomanip>

int main() {
  using namespace std;

  ios_base::sync_with_stdio(false); // Avoids synchronization with C stdio on gcc         
                                    // (either localize both or disable sync)

  wcout.imbue(locale("en_US.UTF-8")); // change default locale

  unsigned char euroUTF8[] = { 0xE2, 0x82, 0xAC, 0x00 }; // Euro sign UTF8

  wstring_convert<codecvt_utf8<wchar_t>> converter_UTF8_wchar;
  wstring euroWideStr = converter_UTF8_wchar.from_bytes((char*)euroUTF8);
  wcout << euroWideStr << endl;

  string euroNarrowStr = converter_UTF8_wchar.to_bytes(euroWideStr);
  cout << euroNarrowStr << endl;
}

A locale is an immutable set of so-called facets that help writing localized-aware code (i.e. render some features specific for a geographic area / culture). Examples are formatting time and date in a specific format (US or EU) or currency parsing. Each feature is represented via a class facet that encapsulates the locale-specific logic.

In the code above the default locale for the wcout global object (output for wide strings) is changed to English – US region with UTF-8 code pages.

After the setup, a sequence of UTF-8 encoded bytes which represent the € euro character codepoint are stored into an array.

The class template std::wstring_convert accepts a code conversion facet to perform the conversion to a wide string. The standard facets provided by the standard library suitable to be used are std::codecvt_utf8 and std::codecvt_utf8_utf16. std::codecvt_utf8 manages conversions from/to UTF-8 to/from UCS2 and from/to UTF-8 to/from UCS4. In order to understand why these conversions are available, recall that the fundamental type wchar_t is usually 32 bit with the exception of Windows systems (16 bit). std::codecvt_utf8_utf16 provides conversion from/to UTF-8 to/from UTF-16.

Starting from C++11 new string literals were also added to specify encoding and type of a literal: L for wchar_t, u8 for UTF-8 encoded, u for UTF-16 encoded and U for UTF-32 encoded. Escape sequences for 16 bit and 32 bit codepoints were also added.

unsigned char euroUTF8_1[] = { 0xE2, 0x82, 0xAC, 0x00 };
unsigned char euroUTF8_2[] = u8"\U000020AC"; // Null character is always appended to the literal          
assert(memcmp(euroUTF8_1, euroUTF8_2, sizeof(euroUTF8_2) / sizeof(unsigned char)) == 0);

Using a 32 bit escape sequence to encode a 32 bit codepoint into a UTF-16 sequence will generate a sequence of UTF-16 code units that can be later converted or used:

#include <iostream>
#include <codecvt>
#include <string>
#include <locale>
#include <iomanip>

void hex_print(const std::string& s) {
  std::cout << std::hex << std::setfill('0');
  for(unsigned char c : s)
    std::cout << std::setw(2) << static_cast<int>(c) << ' ';
  std::cout << std::dec << '\n';
}

int main() {
  using namespace std;
  ios_base::sync_with_stdio(false);
  cout.imbue(locale("C.UTF-8"));

  u16string globeUTF16 = u"\U0001F30D"; // Globe                                             

  wstring_convert<codecvt_utf8_utf16<char16_t>, char16_t> convert_UTF16_UTF8;

  string utf8Str = convert_UTF16_UTF8.to_bytes(globeUTF16);
  cout << "UTF-8 code units: ";
  hex_print(utf8Str);

  cout << utf8Str << endl;
}

It has to be noted that even when string literals are used, compile-time error checking isn’t available and failures should be expected when doing nonsensical operations like converting to UTF-8 a UCS-2 sequence of code units mapped in the range 0xD800-0xDFFF which, by design, are used by UCS-2 to map surrogate pairs to codepoints outside the BMP plane:

char16_t globeUTF16[] = u"\U0001F34C";
wstring_convert<codecvt_utf8<char16_t>, char16_t> convert_UTF8_UCS2;
// std::string u8Str = convert_UTF8_UCS2.to_bytes(globeUTF16); // range_error                  

auto globeCodepoint = (globeUTF16[0] - 0xD800) * 0x400 + globeUTF16[1] - 0xDC00 + 0x10000;

cout << hex << globeCodepoint << endl; // 1F34C

At the time of writing this few compilers actually implement this behavior while others either completely lack the facet features or allow for an illegal conversion.

Compilers settings

A few words might be worth spending talking about compiler behaviors on different systems and architectures. Some compilers might document their default execution charset (gcc has the -fexec-charset option while MSVC has no command-line equivalent but rather a preprocessor directive

#pragma execution_character_set("utf-8")

). This setting is not affected in MSVC by the “multi-byte character set” or “unicode” option in the project property pane that instructs the compiler whether to use wide APIs or not.

Although there are still some rough edges, an ongoing effort to fix standard library implementations and provide more features and algorithms to operate with Unicode is being carried out and will continue to improve with future C++ versions.

A big thanks to Stefano Saraulli, Alessandro Vergani and the reddit community for reviewing this article.

]]>
https://www.italiancpp.org/2016/04/20/unicode-localization-and-cpp-support/feed/ 5 5941
Folding Expressions https://www.italiancpp.org/2015/06/15/folding-expressions/ https://www.italiancpp.org/2015/06/15/folding-expressions/#comments Mon, 15 Jun 2015 16:32:38 +0000 http://www.italiancpp.org/?p=4980 People familiar with the new features C++11 brought to the C++ programming language should know what a variadic template is and why they’re important. Variadic templates can have a variable number of parameters of any type:

template <typename... Types> class tuple;

This not only brings type safety to the code but also ensures that all the variadic arguments handling is performed at compile-time. Before their introduction, in order to have a template accepting a variable number of template parameters, programmers were used to write verbose code like:

template<typename T0>
void function( T0 arg0 );

template<typename T0, typename T1>
void function( T0 arg0, T1 arg1 );

template<typename T0, typename T1, typename T2>
void function( T0 arg0, T1 arg1, T2 arg2 );

template<typename T0, typename T1, typename T2, typename T3>
void function( T0 arg0, T1 arg1, T2 arg2, T3 arg3 );

...

Template parameter packs went hand-in-hand with variadic templates and together with constant expressions they enabled a recursive-like style of coding to create more complex compile-time operations:

#include <iostream>
#include <array>

template<size_t... Is> struct seq{};
template<size_t N, size_t... Is>
struct gen_seq : gen_seq<N-1, N-1, Is...>{};
template<size_t... Is>
struct gen_seq<0, Is...> : seq<Is...>{};

template<size_t N1, size_t... I1, size_t N2, size_t... I2>
// Expansion pack
constexpr std::array<int, N1+N2> concat(const std::array<int, N1>& a1, 
        const std::array<int, N2>& a2, seq<I1...>, seq<I2...>){
  return {{ a1[I1]..., a2[I2]... }};
}

template<size_t N1, size_t N2>
// Initializer for the recursion
constexpr std::array<int, N1+N2> concat(const std::array<int, N1>& a1, 
                                       const std::array<int, N2>& a2){
  return concat(a1, a2, gen_seq<N1>{}, gen_seq<N2>{});
}

int main() {
    constexpr std::array<int, 3> a1 = {{1,2,3}};
    constexpr std::array<int, 2> a2 = {{4,5}};

    constexpr std::array<int,5> res = concat(a1,a2);
    for(int i=0; i<res.size(); ++i)
        std::cout << res[i] << " "; // 1 2 3 4 5

    return 0;
}

Exploiting a constexpr overload of the operator[] the code above generates an integer sequence, aggregate-initializes an std::array and concatenates the input arrays at compile time (details of the operations involved are available at this link).

The integer generation sequence construct was then provided by the standard library itself in C++14 with std::integer_sequence.

These features allowed new ways to exploit templates, anyway parameter packs could only be used and expanded in a strictly-defined series of contexts. For instance, something like the following wasn’t allowed:

template<typename T>
void printer(T arg) {
    std::cout << arg << " ";
}

template<typename... Args>
static void function(Args &&... args) {
    (printer(std::forward<Args>(args)) , ...);
}

Anyway one of those restricted contexts was brace-init-lists, therefore workarounds to have parameter packs be expanded were immediately deployed:

template<typename T>
void printer(T arg) {
    std::cout << arg << " ";
}

template<typename... Args>
static void function(Args &&... args) {
    // Expand the pack into a brace-init-list while discarding the return
    // values and filling an unused array
    int unusedVar[] = { 0, 
              ( (void) printer(std::forward<Args>(args)), 0) ... };
}


C++17 ~ fold expressions

C++17, scheduled by 2017 at the time of writing, will introduce fold expressions into play and significantly broaden parameter packs scopes of use (cfr. N4191 paper).

As listed in cppreference, at the time of writing, there are four kinds of fold expressions:

  • Unary right fold
    ( pack op ... )
  • Unary left fold
    ( ... op pack )
  • Binary right fold
    ( pack op ... op init )
  • Binary left fold
    ( init op ... op pack )

being their respective expansions:

  • E_1 op (... op (E_N-1 op E_N))
  • ((E_1 op E_2) op ...) op E_N
  • E_1 op (... op (E_N−1 op (E_N op init)))
  •  (((init op E_1) op E_2) op ...) op E_N

In binary folds the op operators must be the same and init represents an expression without an unexpanded parameter pack (e.g. the init value for the expanded expression).

With fold expressions writing a printer construct becomes straightforward:

#include <iostream>

template<typename F, typename... T>
void for_each(F fun, T&&... args)
{
    (fun (std::forward<T>(args)), ...);
}

int main() {
     for_each([](auto i) { std::cout << i << " "; }, 4, 5, 6); // 4 5 6
}

The sample above uses fold expressions together with the comma operator to create a simple function that calls the provided lambda per each one of the supplied arguments with perfect forwarding.

A caveat relative to the previous example though: unary right fold expressions and unary left fold expressions applied with the comma operator do yield different expressions but their evaluation order remains the same, e.g.

#include <iostream>
#include <memory>

template<typename F, typename... T>
void for_each1(F fun, T&&... args)
{
    (fun (std::forward<T>(args)), ...);
}

template<typename F, typename... T>
void for_each2(F fun, T&&... args)
{
    (..., fun (std::forward<T>(args)));
}

int main()
{
     for_each1([](auto i) { std::cout << i << " "; }, 4, 5, 6); // 4 5 6
     std::cout << std::endl;
     for_each2([](auto i) { std::cout << i << " "; }, 4, 5, 6); // 4 5 6
}

It has to be noted that one of the main reasons fold expressions were accepted as a C++17 proposal is because of their use in concepts:

template <typename T>
  concept bool Integral = std::is_integral<T>::value;

template <Integral... Ts> // A constrained-parameter pack
  void foo(Ts...);

template <typename... Ts>
  requires Integral<Ts>... // error: requirement is ill-formed
void foo(Ts...);

The problem boiled down to the same issue we talked of some paragraphs ago: the parameter pack cannot expand in that context. Fold expressions provide an elegant and effective way to deal with this issue instead of resorting to other constexpr machineries to ensure requirements are met:

template<typename... Ts>
  requires (Integral<Ts> && ...)
void foo(Ts...);


 

References and sources:
N4191 – Folding expressions
cppreference
constexpr flatten list of std::array into array
Variadic template pack expansion
Why doesn’t a left fold expression invert the output of a right fold expression

Thanks to Marco Arena and Andy for the quick review of this article.

]]>
https://www.italiancpp.org/2015/06/15/folding-expressions/feed/ 2 4980
Serializing access to Streams https://www.italiancpp.org/2015/04/29/serializing-access-to-streams/ Wed, 29 Apr 2015 13:41:54 +0000 http://www.italiancpp.org/?p=3869 Last december, I was at Meeting C++ in Berlin and I attended – among others – the talk Multithreading done right? by Rainer Grimm.

In most of the examples, two or more threads were writing to cout using the form:

cout << someData << "some string" << someObject << endl;

And one of the problems was that data sent from one thread often interrupted another thread, so the output was always messed up.
You can see the problem in this example:

#include <iostream>
#include <vector>
#include <thread>

using namespace std;

int main() {
    vector<thread> thr;
    for(int i = 0; i < 10; i++) {
        thr.emplace_back([i]() { cout << "thread " << i << endl; });
    }
    for(int i = 0; i < 10; i++) {
        thr[i].join();
    }
}

Which on a test execution produces things like this on CLang:

tththrhrereaeadad d  201

And this on Visual Studio:

thread 0 thread thread 2 1

I speculated with Marco Arena about possible solutions to the problem, and tried to find a simple and elegant solution in my way back.

I started designing a solution by giving myself some guidelines, here listed in order of importance:

  1. Opt-in: The user doesn’t have to pay (no performance penalty if the feature is not used).
  2. Readability: The user code must be self-explaining.
  3. Simplicity of use: I wanted to provide the end-user with something that worked “out of the box” requiring as less code as possible.
  4. Predictability: I wanted to maintain all the features and feature the standard streams have.
  5. Concise solution: I didn’t want to write tons of code, I’m too lazy.

An intrusive solution (e.g. adding a mutex to each stream) would have violated the first principle, and quite definitely the fifth too.

Wrapping streams

The first solution I tried was a simple stream wrapper that locks the stream using RAII idiom:

class stream_locker {
    ostream& stream;
    lock_guard<mutex> guard;
public:
    stream_locker(ostream &s, mutex &m) : stream(s), guard(m) {}

    template<typename T> stream_locker& operator << (const T &x) {
        stream << x;
        return *this;
    }
};

Here’s the intended usage:

mutex m;
stream_locker(cout, m) << someData << "some string" << someObject << endl;

This didn’t work, because the type is not getting deduced for manipulators like endl (since it’s an overloaded function).

A solution to this problem could have been manually providing overloads for operator << for manipulators, such as:

class stream_locker {
    ...
    stream_locker& operator << (ostream& (*pf)(ostream&)) {
        stream << pf;
        return *this;
    }
 };

The big drawback with this solution was that we needed to provide an overload for every std manipulator type, like the ones declared in <ostream> and also any other manipulator contained in <iomanip>. Moreover, any overloaded operator << in other libraries or user code would have require additional effort (it violates both simplicity and predictability guidelines I gave myself). This was not getting anywhere.

Another possible solution was to return back using the ostream& as soon as possible, like after the first execution of operator <<.

class stream_locker {
    ostream& stream;
    lock_guard<mutex> guard;
public:
    stream_locker(ostream &s, mutex &m) : stream(s), guard(m) {}

    template<typename T> ostream& operator << (const T &x) {
        return stream << x;
    }
};

This was slightly better, but it would require, sometimes, to insert empty strings to achieve correct results. Consider the following:

stream_locker(cout, m) << value << " in octal is " << oct << value << dec << endl; // OK
stream_locker(cout, m) << oct << value << " in octal is " << dec << value << " in decimal." << endl; // ERROR
stream_locker(cout, m) << "" << oct << value << " in octal is " << dec << value << " in decimal." << endl; // OK

The second line won’t compile, since there’s no overload for operator << between a stream_locker and a manipulator like oct. The third will work, because inserting the empty string will result in calling our member operator <<, which now returns an ostream&.

We could definitely solve the problem by making the stream variable public, and changing the usage syntax slightly:

class stream_locker {
    lock_guard<mutex> guard;
public:
    ostream& stream;
    stream_locker(ostream &s, mutex &m) : stream(s), guard(m) {}
};

stream_locker(cout, m).stream << value << " in octal is " << oct << value << dec << endl; // OK
stream_locker(cout, m).stream << oct << value << " in octal is " << dec << value << " in decimal." << endl; // OK
stream_locker(cout, m).stream << "" << oct << value << " in octal is " << dec << value << " in decimal." << endl; // OK

This works, but the syntax looks horrible.

Lifetime of temporaries

Wait a second, why does this works? If we are return back to the ostream&, doesn’t our temporary stream_locker get destroyed right away, releasing the lock?

The answer is no. Any temporary variable created in a sub-expression have its lifetime extended to the “full-expression”, which in our case ends at the semicolon.

The stream_locker object is now just behaving exactly like a normal lock_guard<mutex>. Can’t we just use a lock_guard<mutex> directly?

cout << lock_guard<mutex>(m) << ... ;

To provide this syntax we just need to overload the operator <<:

inline ostream& operator<<(ostream& out, const lock_guard<mutex> &) {
    return out;
}

This solution doesn’t have the drawbacks of wrapper object, since no wrapper is created anywhere. The stream is returned right away, and the lock guard parameter is there for the only sake of including it in the streaming chain.

This definitely looks promising, but the syntax is still a bit too complex. Could we possibly simplify it?

I would definitely prefer something like:

cout << lock_with(m) << ... ;

Basically, lock_with should return a fresh new lock_guard locking our mutex m. Something like this:

template <typename T> lock_guard<T> lock_with(T &mutex) {
    return lock_guard<T>(mutex);
}

Unluckily, this doesn’t work, as you can see yourself by trying to compile the following snippet.

#include <mutex>
#include <iostream>

using namespace std;

inline ostream& operator<<(ostream& out, const lock_guard<mutex> &) {
    return out;
}

template <typename T> lock_guard<T> lock_with(T &mutex) {
    return lock_guard<T>(mutex);
}

int main() {
    mutex m;
    cout << lock_with(m) << "test" << endl;
}

Problem is, lock_guard is not copyable. Moving is also deleted, because this class implements the concept of “scoped ownership”. Moreover, the constructor taking a mutex is explicit, so we can’t exploit copy-list-initialization in our solution. Not with that constructor, at least.

As often happens, the solution to my problem was already on stack overflow, The link explains why lock_guard is not moveable, and it provides a nice workaround, using another constructor, taking two parameters (so we can use copy-list-initialization on that).

template <typename T> inline lock_guard<T> lock_with(T &mutex) {
    mutex.lock();
    return { mutex, adopt_lock };
}

The solution

Finally, we can test our original code with the final structure:

#include <mutex>
#include <vector>
#include <thread>
#include <iostream>

using namespace std;

inline ostream& operator<<(ostream& out, const lock_guard<mutex> &) {
    return out;
}

template <typename T> inline lock_guard<T> lock_with(T &mutex) {
    mutex.lock();
    return { mutex, adopt_lock };
}

int main() {
    mutex m;
    vector<thread> thr;
    for(int i = 0; i < 10; i++) {
        thr.emplace_back([i, &m]() { cout << lock_with(m) << "thread " << i << endl; });
    }
    for(int i = 0; i < 10; i++) {
        thr[i].join();
    }
}

]]>
3869
Anti-IF idioms in C++ https://www.italiancpp.org/2014/11/23/anti-if-idioms-in-cpp/ https://www.italiancpp.org/2014/11/23/anti-if-idioms-in-cpp/#comments Sun, 23 Nov 2014 17:03:10 +0000 http://www.italiancpp.org/?p=3751 Last November 15th I attended the Italian Agile Day in Ancona, where Gianluca Padovani, Marco Foco, and I facilitated a workshop on refactoring in C++ (if you like, read more details here). Before our session, I attended a couple of talks and one was about “IF-Oriented Programming vs Object Oriented Programming”, by Claudio Pattarello (a former collegue of mine). I enjoyed this talk because it was completely on the code (in C#) and rich of interesting examples on removing IFs. I have attended this kind of talks for at least 8 years (the first time was at the university, in 2006) but this time something was different: each solution Claudio introduced could be written in C++ with almost no effort. This wouldn’t have been possible without new C++.

In this post I’d like to show you how to rewrite these idioms in C++, starting from a C# snippet. And the resulting code is pretty much identical. Here are Claudio’s examples. First let me clarify:

  • I’m skipping visitors because they are well-known in C++ (and from C++11, variadic templates could help even more);
  • Claudio’s target was to show different ways to remove IFs and to play with the code. The real point is: don’t tell “the IF cannot be removed”, but instead, wonder if it’s worth. For this reason, I’m not discussing edge cases nor saying these rules apply everywhere (they don’t)..
  • I’m not saying “this is the best thing you can do in C++”. For example: as Nicola suggested, the first example I’m showing could be rewritten by using an optional type, that is (likely) a more effective thing to do in C++ and it results in less code. Sure, we have specific idioms in our favorite language but I think it’s lovely that we are able to reuse the same code from other languages with a little effort. And this is also appealing for other programmers who want to experience C++, maybe by starting coding idioms they know. Please, stop me if I’m raving!

Let’s start with an example you have to deal with a null. Here is the first version of the code:

public void NullResult_IF()
{
      var employeeRepository = new EmployeeRepositoryNullIF();
      var employee = employeeRepository.GetOrNull("CLAPAT01");
      if (employee!=null)
         Assert.AreEqual("Claudio", employee.Name);
}

What if the repository does not contain the name we are looking for? It returns a null. Null case has to be handled. Suppose you have to fill a form with employee’s data. Using a NullObject doesn’t help so much because you won’t fill a form with fake values.

Then, the proposed solution is to add a layer of abstraction and pass an action instead of checking the null case by hand:

public interface IPerformer<out T>
{
    void Ask(Action<T> action);
}

public class Performer<T> : IPerformer<T>
{
    private readonly T value;

    public Performer(T value)
    {
        this.value = value;
    }

    public void Ask(Action<T> action)
    {
        action(value);
    }
}

public class NoPerformer<T> : IPerformer<T>
{
    public void Ask(Action<T> action) { }
}

public class EmployeeRepositoryNullNOIF
{
    //...

    public IPerformer<Employee> Get(string number)
    {
        var employee = repository.SingleOrDefault(e => e.Number == number);
        if (employee==null)
            return new NoPerformer<Employee>();
        return new Performer<Employee>(employee);
    }
}

public void CentralisedNullResult_NOIF()
{
    var employeeRepository = new EmployeeRepositoryNullNOIF();
    var employee = employeeRepository.Get("CLAPAT01");
    employee.Ask(e => Assert.AreEqual("Claudio", e.Name));
}

employeeRepository now returns an IPerformer<T>, an interface you can ask to call a lambda which wants a parameter of type T. In this example, the lambda will be called only if the T “is valid”. An alternative consists in providing also a lambda for the failing case. But now the juicy part: what does it look like in C++? This is a possible migration:

#include <functional>
#include <iostream>
#include <memory>
#include <map>
using namespace std;

template<typename T>
class IPerformer
{
public:
    virtual ~IPerformer() = default;
    virtual void Ask(function<void(const T&)> f) = 0;
};

template<typename T>
class NoPerformer : public IPerformer<T>
{
public:
    void Ask(function<void(const T&)>) override 
    {

    }
};

template<typename T>
class Performer : public IPerformer<T>
{
public:
    Performer(T _value)
        : value(move(_value))
    {

    }

    void Ask(function<void(const T&)> action) override 
    {
        action(value);
    }

private:
    T value;
};

struct Employee
{
    string Name;
    string Last;
    string Number;
};

class EmployeeRepositoryNullNOIF
{
public:
    unique_ptr<IPerformer<Employee>> Get(const string& number)
    {
        const auto it = employees.find(number);
        if (it != end(employees))
            return make_unique<Performer<Employee>>(it->second);
        return make_unique<NoPerformer<Employee>>();
    }

    map<string, Employee> employees = { 
        {"1", {"Marco", "Arena", "1"}},  
        {"2", {"Claudio", "Pattarello", "2"}}
    };
};

int main()
{
    EmployeeRepositoryNullNOIF repo;
    auto employee = repo.Get("2");
    employee->Ask([](const Employee& e) {
        cout << e.Name << " " << e.Last << endl;
    });
}

It’s just a toy, but you may play with this example. Ha, I forgot to say: C++ examples here are editable and runnable (thanks to Coliru online compiler and ACE editor). Then, try the code yourself!

C++’s control is finer, thus I considered passing T to Performer by value and then move-construct the internal one; also, T is passed to the function by const&.

Apart from these details, C++ code is pretty much the same as the C# one.

The next example is a variation of the previous one. It’s about doing something with the result of an operation if it succeeds, otherwise doing something else. Trivial. Here is the C# snippet:

public void ConvertNumber_IF()
{
    int number;
    if (int.TryParse("38", out number))
        Assert.AreEqual(38, number);
    else
        Assert.Fail();
}

To remove this IF, Claudio used a lambda, again:

public void ConvertNumberWrapIf_NOIF()
{
    Action action = Assert.Fail;
    ConvertNumberStatusNOIF.Parse("38", number =>
    {
        action = () => Assert.AreEqual(38, number);
    });
    action();
}

As before, the lambda gets called only if the operation succeeds. C++ code is embarrassingly similar:

#include <iostream>
#include <sstream>
#include <functional>

using namespace std;

class ConvertNumberStatusNOIF
{
public:
    static void Parse(const string& number, function<void(int)> action)
    {
        stringstream ss(number);
        char c;
        int age;
        if (ss >> age && !ss.get(c))
            action(age);
    }  
};

int main()
{
    function<void()> action = []{ cout << "fail\n"; };
    ConvertNumberStatusNOIF::Parse("38", [&](int number)
    {
        action = [=]() { cout << number << endl; };
    });
    action();
}

A variation consists in making action a sort of scoped function (i.e. that will be executed when the scope ends – I saw this kind of workflow to handle DB transactions).

Going ahead, the following snippet handles error conditions by catching exceptions explicitly:

public void CatchExceptions_IF()
{
    var promocode = new PromocodeStatusIF();
    try
    {
        promocode.Apply("g128g7d2g");
    }
    catch (AlreadyUsedPromocodeException)
    {
        Assert.Pass("Already used");    
    }
    catch (ExpiredPromocodeException)
    {
        Assert.Pass("Expired");
    }
    catch (NotValidPromocodeException)
    {
        Assert.Pass("Not valid");
    }
    Assert.Fail("no exception");
}

Try/Catch can be seen as another kind of IF, maybe more structured. To remove yet another IF, we pass actions instead and we let the promocode checker call the right one for us:

public void RemoveCatchExceptionsAndUseMessages_NOIF()
{
    var promocode = new PromocodeStatusNOIF();
    promocode
        .AlreadyUsed(() => Assert.Pass("Already used"))
        .Expired(() => Assert.Pass("Expired"))
        .NotValid(() => Assert.Pass("Not valid"))
        .Apply("g128g7d2g");

    Assert.Fail("no exception");
}

PromocodeStatusNOIF just calls the right action. No exceptions are involved, no manual handling to do, adding/removing logic is trivial. What does it look like in C++? Not so different:

#include <iostream>
#include <functional>
#include <string>

using namespace std;

class PromocodeStatusNOIF
{
    function<void()> alreadyUsed = []{};
    function<void()> expired = []{};
    function<void()> notValid =  []{};

public:
    PromocodeStatusNOIF& AlreadyUsed(function<void()> action)
    {
        alreadyUsed = action;
        return *this;
    }

    PromocodeStatusNOIF& Expired(function<void()> action)
    {
        expired = action;
        return *this;
    }

    PromocodeStatusNOIF& NotValid(function<void()> action)
    {
        notValid = action;
        return *this;
    }

    void Apply(const string& promocode)
    {
        // logic...
        expired(); 
    }
};

int main()
{
    PromocodeStatusNOIF{}
        .AlreadyUsed([] { cout << "Already used\n"; })
        .Expired([] { cout << "Expired\n"; })
        .NotValid([] { cout << "NotValid\n"; })
        .Apply("g128g7d2g");
}

I don’t have the final Assert.Fail because we are not really testing anything. A variation is to add the “no exception” action either in the constructor or as another function of the chain.

The last example I show you is a singleton factory which the first time you ask an instace it creates and returns a fresh one, other times this instance is reused. The IF solution is straightforward:

public class FooSingletonLazyFactoryIF
{
    private Foo foo;

    public Foo Get()
    {
        if (foo==null)
            foo = new Foo();
        return foo;
    }
}

Removing the IF is a good exercise of self-modifying lambdas:

public class FooSingletonLazyFactoryNOIF 
{
    public FooSingletonLazyFactoryNOIF()
    {
        Get = () =>
        {
            var foo = new Foo();
            Get = () => foo;
            return Get();
        };
    }

    public Func<Foo> Get { get; private set; }
}

Forget using a local static variable, just enjoy a different way to code. What does it look like in C++? This is a possible implementation:

#include <functional>
#include <iostream>

using namespace std;

struct Foo
{ 
    Foo(int _magic) 
       : magic(_magic)
    { cout << "new Foo" << endl; }
    int magic = 0; 
};

class FooSingletonLazyFactoryNOIF 
{
public:
    FooSingletonLazyFactoryNOIF()
    {
        _Get = [this] 
        {
            _Get = [foo=Foo{256}]() -> const Foo& { return foo; };
            return _Get();
        };
    }

    const Foo& Get() const
    {
        return _Get();
    }

private:
    function<const Foo&()> _Get;
};

int main()
{
    FooSingletonLazyFactoryNOIF factory;
    auto&& foo = factory.Get();
    auto&& foo2 = factory.Get();
    cout<< foo.magic << endl;
    cout<< foo2.magic << endl;
}

Thanks to init-lambda capture I was able to initialize Foo directly in the lambda. This version has a problem: what if the factory goes out of scope? The internal foo is destroyed and we have a dangling reference. Ok, you can make the factory a global object but there is another solution using smart pointers. Don’t try to move-capture a unique_ptr into the lambda because it is not possible to construct a std::function from a move-only type (cfr. §20.9.11.2.1 [func.wrap.func.con]). We can use a shared_ptr and return it, or return just a Foo*, or a weak_ptr. For example:

#include <functional>
#include <memory>
#include <iostream>

using namespace std;

struct Foo
{ 
    Foo(int _magic) 
       : magic(_magic)
    { cout << "new Foo" << endl; }
    int magic = 0; 
};

class FooSingletonLazyFactoryNOIF 
{
public:
    FooSingletonLazyFactoryNOIF()
    {
        _Get = [this] 
        {
            auto foo = make_shared<Foo>(256);
            _Get = [foo=move(foo)]() { return foo; };
            return _Get();
        };
    }

    auto Get() const
    {
        return _Get();
    }

private:
    function<weak_ptr<Foo>()> _Get;
};

int main()
{
    FooSingletonLazyFactoryNOIF factory;
    auto foo = factory.Get();
    if (auto spt = foo.lock()) // not needed actually
    {
        cout << spt->magic << endl;
    }
}

As you can note, C++ snippets are very similar to C# ones. Since C++ gives more control and it doesn’t have a garbage collector, you need to be careful of more aspects. In particular, dealing with self-modifying lambdas could lead to several problems like capturing local variables by reference.

Using std::function is not optimal. Not only because of its runtime cost but also because of its inability to work with move-only captured types. I’d like also to share a fact you could care about when dealing with self-modifying lambdas. Suppose you wrote this circular object pool by using the same ideas we discussed a moment ago:

class Pool
{
    function<int()> generator;

    function<int()> next(vector<int> v, size_t i)
    {
        return [this, v=move(v), i]{
            auto tmp = v[i];
            auto size = v.size();
            generator = next(move(v), (i+1)%size);
            return tmp;
        };
    }
public:    
    Pool(vector<int> v)
        : generator {next(move(v), 0)}
    {

    }

    int operator()()
    {
        return generator();
    }
};

//...user code
Pool pool { {1,2,3} };
cout << pool.Get() << endl; // 1
cout << pool.Get() << endl; // 2
cout << pool.Get() << endl; // 3
cout << pool.Get() << endl; // 1

You are storing the vector inside generator (which is a std::function). How many copies of this vector do you expect? You could say: “Zero! Just moves, because I’m always moving the vector from a lambda to the next one”. I thought so too. Look more carefully how you are updating the next function:  when you capture the vector by move it gets moved into the lambda. Ok. What next? You store tmp and size, ok. Then? You update generator… Ah! Moving the vector here is just a copy! Why? Because the lambda is not mutable, thus the vector is const and moving a const object is actually (and silently) a copy. I prepared this snippet to let you play (i.e. remove mutable):

#include <iostream>
#include <memory>
#include <vector>
#include <functional>
using namespace std;

struct sentinel
{
    sentinel() = default;
    sentinel(const sentinel&) { cout << "copy" << endl; }
    sentinel(sentinel&&) { cout << "move" << endl; }
};

class Pool
{
    function<int()> generator;

    function<int()> next(vector<int> v, sentinel s, size_t i)
    {
        return [this, v=move(v), i, s=move(s)]()mutable{
            auto tmp = v[i];
            auto size = v.size();
            generator = next(move(v), move(s), (i+1)%size);
            return tmp;
        };
    }
public:    
    Pool(vector<int> v)
       : generator {next(move(v), sentinel{}, 0)}
    {

    }

    int operator()()
    {
        return generator();
    }
};

int main()
{
    Pool p { {1,2,3} };
    cout << p() << endl;
    cout << p() << endl;
    cout << p() << endl;
}

Target of this post was purely to implement Claudio’s examples in C++, taking into account some particularities of our favourite language. As I said, I’m not discussing edge cases nor saying these idioms apply everywhere (they don’t). If you have more examples please share them here!

]]>
https://www.italiancpp.org/2014/11/23/anti-if-idioms-in-cpp/feed/ 5 3751
Brace initialization inside a lambda capture list https://www.italiancpp.org/2014/05/26/brace-initialization-inside-a-lambda-capture-list/ Mon, 26 May 2014 13:24:05 +0000 http://www.italiancpp.org/?p=3229 Mi sono inmbattuto oggi in un problema abbastanza subdolo:

#include <iostream>
#include <functional>
#include <vector>
using namespace std;
function<void()> f()
{
    vector<int> v(1000, 0);
    cout << "v size is: " << v.size() << endl;

    return [ v {move(v)} ]() {
        cout << "inside lambda => v size is: " << v.size() << endl;
    };
}

int main()
{
    f() ();
}


Se eseguite il codice, l’output del programma è (ammetto con mia sorpresa) il seguente:

v size is: 1000
inside lambda => v size is: 1

La dimensione di v all’interno della lambda è 1. Ma perchè?
Il problema, come vedremo, non ha nulla a che vedere con le lambda in se ma risiede nel modo in cui v viene dedotto nella capture list.

Dato un vettore cosi definito:

vector<int> v(1000, 0); // crea 1000 interi inizializzati a 0


e altri due oggetti (t, u) cosi definiti:

vector<int> u{10,20,30};
auto t{10,20,30};

u è un vettore come tutti ce lo aspettiamo, completamente definito ed inizializzato (tramite la uniform initialization) con tre numeri (10, 20, 30), t invece è definito tramite auto. Il compilatore deduce t come std::initializer_list<decltype(v)> (= std::initializer_list<vector<int>>)!.

Ecco perchè v catturata dentro la lambda, definita implicitamente come auto, ha dimensione 1 (in quel caso abbiamo una std::initializer_list<vector<int>>). Per evitare il problema dobbiamo dunque riscrivere la nostra funzione in questo modo:

#include <iostream>
#include <functional>
#include <vector>
using namespace std;

function<void()> f()
{
    vector<int> v(1000, 0);
    cout << "v size is: " << v.size() << endl;

    return [ v = move(v) ]() { //adesso v è un vector<int>
        cout << "inside lambda => v size is: " << v.size() << endl;
    };
}

int main()
{
    f()();
}

Finalmente il compilatore dedurrà correttamente v nella capture list come un vector<int>. Con questa modifica il risultato è quello atteso!

Concludendo: evitate di usare la brace initialization all’interno delle capture list, ma piuttosto utilizzate la notazione VAR = EXPR, perchè auto + brace initialization implica che il tipo dedotto sia una std::initializer_list.

Biblio: [Scott Meyers] – Item 7 More Effective C++

]]>
3229
Una sbirciatina al C++14 https://www.italiancpp.org/2014/02/03/una-sbirciatina-al-cpp14/ Mon, 03 Feb 2014 11:40:45 +0000 http://www.italiancpp.org/?p=2614 Il C++14 è il nome informale della prossima revisione dello standard C++ ISO/IEC che potrebbe essere ufficializzata quest’anno. La bozza approvata dal comitato ISO – N3797 – è stata pubblicata il 15 Maggio 2013.

In questo breve articolo vediamo alcune delle features più interessanti già disponibili su Clang (ogni argomento ha il link al relativo paper/draft). Vi diamo la possibilità di provare alcuni esempi direttamente nell’articolo. Qualsiasi autore può utilizzare nei propri articoli questi “snippet compilabili”, quindi se avete voglia di scrivere un articolo del genere fatevi sotto!

Generic lambdas & initialized capture

Scrivendo una lambda, quante volte  vi siete chiesti: “ma perché il compilatore non deduce il tipo dei parametri automaticamente?!” Per esempio in un for_each:

vector<int> v;
for_each(begin(v), end(v), [](int i) {
   cout << i << endl;
});

Per prima cosa il compilatore sa quale tipo ci va, ma non solo: la stessa lambda (a parte il parametro) potrebbe essere riutilizzata altrove, per stampare qualsiasi oggetto che supporti l’operator<<(ostream&):

auto printer = [](string s) {
   cout << s << endl;
};

In C++14 è possibile creare delle lambda generiche (dette anche polimorfe), tramite auto:

#include <iostream>

using namespace std;

int main()
{
   auto printer = [](auto value) {
      cout << "PRINTING: " << value << endl;
   };

   printer(10);
   printer("hello");
}

Un lambda stateless (con cattura “vuota”), proprio come nel C++11, si può castare ad un puntatore a funzione appropriato:

auto printer = [](auto value) {
    cout << value << endl;
};

void(*printIntFn)(int) = printer;

Un altro limite delle lambda riguarda la cattura che è consentita solo per copia e per reference, escludendo, di fatto, una cattura “by-move”. Si possono adottare alcuni workaround – come bind oppure nascondere una move sotto una copy – ma si tratta, appunto, solo di trucchi per eludere un limite del linguaggio.

Nel C++14 la sintassi della capture-list permette delle vere e proprie inizializzazioni di variabili (questa nuova caratteristica è chiamata, infatti, initialized lambda capture):

#include <iostream>
#include <memory>
#include <vector>
#include <numeric>
using namespace std;

int main()
{
    unique_ptr<int> ptr {new int{10}};

    auto closure = [ptr = move(ptr)]{
        cout << "From closure: " << *ptr << endl;
    };

    closure();

    if (ptr) // is ptr valid?
        cout << "ops...move didn't work..." << endl;

    vector<int> v{1,2,3,4,5};

    auto printSum = [sum = accumulate(begin(v), end(v), 0)]{
        cout << "Sum is: " << sum << endl;
    };

    printSum();
}

Suggeriamo di non abusare di questa notazione…Non è il caso di scrivere tutto il codice tra [ ] 🙂

Notevole di questa sintassi è il poter creare delle closure con “full ownership”: nell’esempio di prima, se avessimo introdotto uno shared_ptr la lambda avrebbe in qualche modo condiviso la proprietà del puntatore con lo scope in cui è definita. Al contrario, muovendo uno unique_ptr dentro la lambda si sta completamente trasferendo la proprietà del puntatore all’interno della stessa. Non mancano casi in cui questa nuova sintassi farà davvero comodo, specialmente in ambito multi-thread.

Return type deduction for normal functions

In C++11 il tipo di ritorno di una lambda viene dedotto automaticamente se questa è composta di una sola espressione. Nel C++14 la deduzione del tipo di ritorno di una lambda viene estesa anche per casi i più complicati.

Ma non solo: la deduzione automatica è abilitata anche per le funzioni ordinarie, tramite due diverse notazioni:

// auto-semantics
auto func() {...}

// decltype-semantics
decltype(auto) func() {...}

Nel primo caso il tipo di ritorno è dedotto seguendo la semantica di auto (e.g. &-qualifiers eliminati), nel secondo quella di decltype. Vediamo un esempio con auto:

#include <iostream>
#include <type_traits>
#include <vector>
using namespace std;

template<typename T>
auto sum(T&& a, T&& b)
{
    return a+b;
}

int main()
{
    cout << "Summing stuff:" << endl;
    cout << sum(1, 2) << endl;
    cout << sum(1.56, 3.66) << endl;
    cout << sum(string{"hel"}, string{"lo"}) << endl;
}

Uno scenario nel quale questa notazione sarebbe poco appropriata (l’esempio non compila, di fatto):

#include <memory>
using namespace std;

struct Base {};
struct Derived : Base {};

shared_ptr<Base> share_ok(int i) // ok
{
    if (i)
        return make_shared<Base>();
    return make_shared<Derived>();
}

auto share_ops(int i) // ops
{
    if (i)
        return make_shared<Base>();
    return make_shared<Derived>();
}

int main()
{
    auto shared1 = share_ok(1);
    auto shared2 = share_ops(1); 
}

In ogni caso, prima di dare linee guida o suggerimenti stilistici, attendiamo di utilizzare questa feature in produzione.

Ed ecco auto e decltype(auto) a confronto:

#include <iostream>
#include <type_traits>
#include <vector>
using namespace std;

vector<int> v{1,2,3,4};

decltype(auto) getDecltype(size_t index)
{
    return v[index];
}

auto getAuto(size_t index)
{
    return v[index];
}

int main()
{
    auto val = getAuto(0); 
    auto anotherVal = getDecltype(1);
    auto& ref = getDecltype(0);
    ref += 10; // aka: v[0] += 10;
    cout << "copied v[0] = " << val << endl;
    cout << "copied v[1] = " << anotherVal << endl;

    cout << "final vector: [ ";
    for (auto i : v)
        cout << i << " "; 
    cout << "]" << endl;
}

Questo facile esempio mostra che, nonostante l’operator[] di un vector riporti una reference, la semantica di deduzione di auto vuole che i ref-qualifiers siano eliminati. Per questo getAuto() restituisce un int (per copia). Viceversa, con decltype(auto), vengono utilizzate le regole deduttive di decltype che preservano i qualificatori (e quindi il fatto che operator[] riporti una reference). Per una spiegazione più accurata di auto e decltype vi consigliamo questo articolo di Thomas Becker. Provate a giocare con l’esempio direttamente nell’articolo. Se siete soliti scrivere codice generico, troverete molto utili queste due novità!

Variable templates

Nel C++11 non c’è modo di parametrizzare una costante direttamente con i template, come invece è possibile per classi e funzioni. Generalmente vengono utilizzati dei workarounds come classi al cui interno sono definite una serie di static constexpr (e.g. vedi numeric_limits).

Dal C++14 è consentito definire delle constexpr variable templates (o solo variable templates), come ad esempio:

#include <iostream>
#include <iomanip>
using namespace std;

template<typename T>
constexpr T pi = T(3.1415926535897932385);

template<typename T>
T areaOfCircle(T r) 
{
    return pi<T> * r * r;
}

int main()
{
    cout << setprecision(10);
    cout << "PI double = " << pi<double> << endl;
    cout << "PI float = " << pi<float> << endl;
    cout << "Area double = " << areaOfCircle(1.5) << endl;
    cout << "Area float = " << areaOfCircle(1.5f) << endl;
}

La nostra carrellata del C++14 è completa. Non esitate a lasciare un commento con le vostre impressioni. E se un commento non vi basta, scrivete un intero articolo! Contattateci e vi aiuteremo a pubblicare il vostro “pezzo” su ++it – i mini-compilatori sono disponibili per tutti!

]]>
2614