Lock based implementations (using mutexes for synchronization) are natural, but have several problems: - possibility of deadlock if not carefull with the order of locks -- priority inversion - low priority thread is holding a lock required for high priority thread to continue - if a thread currently holding a mutex is slowed down (I/O, page fault, etc) all other threads are using the same critical section will have to wait Lock-free algorithms We need special instruction - some kind of CAS (compare and exchange) bool CAS( location, expected, new ) CAS is atomic and executes if ( location == expected ) { location = new; return true; } else { expected = location; return false; } C++11 defines compare_exchange_* on std::atomic types: /////////////// // from cas.cpp /////////////// std::atomic counter( 0 ); int expected = 0; // compare_exchange_* needs a reference (will write there if failed) //++counter; std::cout << "Count " << counter << std::endl; if ( counter.compare_exchange_strong( expected , 1 ) ) { // location expected new std::cout << "Success" << std::endl; std::cout << "Count " << counter << std::endl; } else { // if false, it replaces expected with the contained value std::cout << "Failed: found " << expected << " instead" << std::endl; } Basic structure of a lock-free algorithm is: // implement insert for some data structure Container data; void insert( int new_val ) { 1: do { 2: old_state = data; // local copy of what we think the DS should be when we insert 3: new_state = data; // local value to be modified 4: new_state.insert( new_val ); // modify local 5: } while ( CAS( data, old_state, new_state ) == false ); } the idea: a) get a local copy of data b) update local copy c) try to substitute data with the updated, IF data has not changed a,b) - insert into a COPY so that other threads can safely use data while we are inserting c) - When done inserting into the copy - decide if we can substitue data with new_state. This is OK if no one updated data meanwhile. To check if that's the case use CAS - provide recorded old_state (from line 2). old_state is what we think data looks like, if not - someone has updated it while we were executing line 3,4, so we have to start over. See lockfree_sorted_vector.cpp for C++ code. main loop: void Insert( int const & v ) { std::vector *pdata_new = nullptr, *pdata_old; do { delete pdata_new; // delete pdata_new created in the previous iteration // if this is the first iteration, it will be "delete nullptr;" which is no-op. pdata_old = pdata; pdata_new = new std::vector( *pdata_old ); //perform insert in order into pdata_new } while ( !(this->pdata).compare_exchange_weak( pdata_old, pdata_new )); } If we have to copy the whole data structure to perform an update, plus have a possibility that update has to be repeated, the average perfomance of the algorithm will probably be pretty low. There are 2 possibilities: 1) we know that updates are rare and some other lockfree operations will be perfomed much more often ( index operator ). Many reads, few writes. Then the previous implementation is viable. 2) data structure allows local updates - without coping all data. See next example Lock-free list: see lockfree_push_pop_front.cpp // list is represented by the head pointer std::atomic list_head (nullptr); void push_front (int val) { Node* oldHead = list_head; // get a copy of the head (NOT the whole data structure) Node* newNode = new Node {val,oldHead}; // prepare new node - assume iw will point to the old head while (!list_head.compare_exchange_weak( oldHead, newNode ) ) { // try to compare-and-swap head and the new node newNode->next = oldHead; // if failed, then someone inserted while we were thinking // fix new node to point to the updated head and try again } } int pop( ) { // assumes non-empty list Node* oldHead = list_head; // get a copy of the head (NOT the whole data structure) // try to compare-and-swap head with the next - effectively changing the head of the list // oldHead will contain popped data // nothing to update inside the loop, if CAS failed - i.e. someone modified head while // we are were between previous and next line - CAS will update oldHead with the new head pointer while (!list_head.compare_exchange_weak( oldHead, oldHead->next )) { } // retrieve popped data and delete popped node int ret_val = oldHead->value; delete oldHead; return ret_val; } compile g++ -std=c++11 -pthread lockfree_push_pop_front.cpp and run a.exe (or a.out) main inserts n+2 nodes and then pops n nodes, so final list contains 2 nodes. Notice that due to the uncertainty of thread scheduling we do not know which 2 nodes will be in the list. Notice 2 problems in the above 2 examples: problem 1: we do not delete old state in LFSV - leak. The reason is because the old state may be in use by another writer or anothre reader. Deleting it will cause a crash. problem 2: is much harder to notice - ABA problem: ABA problem thread A | thread B --------------------------------------+----------------------------------- | //pop | oldHead = list_head; // say 0xeb02 | //the next line is not atomic - first | //it reads arguments, the calls CAS | //it may read oldHead->next | //and then get preempted by thread B | | | | //pop 1 | oldHead = list_head; | while ( !CAS ( ... ) { } | delete oldHead; // address 0xeb02 is freed | // at this point head is some other address | //push 1 | oldHead = list_head; | newNode = new Node {val,oldHead}; | while ( !CAS( ... ) {...} | //push 2 | oldHead = list_head; | newNode = new Node {val,oldHead}; // new returned 0xeb02 | while ( !CAS( ... ) {...} | // after this line head is 0xeb02 | //oldHead matches list_head !! | //CAS continues with the old value | //of oldHead->next | while (!list_head.CAS( oldHead, | value read before )) { | The list will look like this: initially 1->2->3 thread A starts popping 1, get address of node containing 1 - say 0xeb02, also read oldHead->next - address of node 2 then B kicks in: delete 1: 2->3 insert 4: 4->2->3 insert 5: 5->4->2->3, note that address of node 5 is 0xeb02 thread A continues, oldHead matches current list_head, so CAS proceeds with assignment, but uses oldHead->next that points to 2, resulting in 5->4->2->3 ^ | 1 // 1 is the head ABA problem is also present in LFSV class implementation. The problem is a little different since the whole data structure is swapped on update (not just the head pointer). For example it will show up even with 2 Inserts. Solutions to ABA problem: Solution 1: do not delete old nodes - memory leak, only if deletes are very rare and/or happen at the very end of the program execution. Solution 2: deleted nodes are sent to a queue where they spend some time. This will stop (or rather minimize) the possibility of ABA. Since nodes are not deleted immediately the new node created by push/Insert will not be the just removed. Sample code (using LFSV): class GarbageRemover { std::deque< std::pair< std::vector*, // pointer std::chrono::time_point // time node was received > > to_be_deleted; std::mutex m; std::atomic stop; std::thread worker; void WatchingThread() { while( !stop ) { std::this_thread::sleep_for( std::chrono::milliseconds( 20 ) ); std::lock_guard lock( m ); if ( !to_be_deleted.empty() ) { //peek std::chrono::duration how_old = std::chrono::system_clock::now() - to_be_deleted[0].second; if ( how_old > std::chrono::duration(250) ) { // 250 ms waiting period std::vector* p = to_be_deleted[0].first; to_be_deleted.pop_front(); delete p; } } } // free the rest for (auto& pt : to_be_deleted) delete pt.first; } ..... }; instead of deleting vectors, send them to GarbageRemover gr.Add( pdata_old ); Solution 3: similar to 2 it creates a delay before memory can be reused: for example a std::deque, where deleted nodes inserted on one end and returned from the other. Deque is created with enough nodes in it to guarantee that time required for a node to be inserted, travel from one end to another, be returned back to the user is long enough to eliminate ABA. Sample code (using LFSV): class MemoryBank { std::deque< std::vector* > slots; public: MemoryBank() : slots(6000) { for ( int i=0; i<6000; ++i ) { slots[i] = reinterpret_cast*>( new char[ sizeof(std::vector) ] ); } } ...... }; all places that allocate new vector now use placement new: LFSV() : mb(), pdata( new ( memorybank.Get() ) std::vector ) { } all places that delete vectors clean up the data (call std::vector::~vector()) and return memory to the bank: pdata.load()->~vector(); mb.Store( pdata.load() ); Either of this will solve LFSV with inserts only. But by adding a reader we are back to square 1: Insert is about to substitute vector with the updated one, but another thread started a read operation. If corresponding vector is deleted the read will be using a dangling pointer. (reference "Lock-Free Data Structures" by Andrei Alexandrescu - see link on web-site) To solve this problem we may remember reference counting pattern: reader increases ref count for the duration of the read operation and writer keeps an eye of the counter, if counter is not 1 - continue looping till readers are done. One catch if that if ref counter is allocated separately from the vector pointer (no contiguous location) we will need a so called DCAS or CAS2: DCAS( loc1, expected1, loc2, expected2, new_value1, new_value2 ) which is not standard yet. To fix - wrap pointer and counter into a struct ( which will ensure side by side allocation), make an object of that struct atomic: struct Pair { std::vector* pointer; int counter; } __attribute__((aligned(16),packed)); // bug in GCC 4.8, fixed in 5.1 // alignment needed to stop std::atomic::load to segfault the "__attribute__" is GNU specific compiler directive. New insert (incomplete!!!): void insert( int new_val ) { Pair pdata_new, pdata_old; do { old_state.pointer = data.pointer; old_state.counter = 1; // !!!!!! new_state.pointer = new updated data; // local - not shared - variable new_state.counter = 1; // !!!!!! } while ( CAS( data, old_state, new_state ) == false ); remove old_state; } int operator[] ( int pos ) { // not a const method anymore Pair pdata_new, pdata_old; do { // before read - increment counter, use CAS pdata_old = pdata.load(); pdata_new = pdata_old; ++pdata_new.ref_count; } while( CAS( data, pdata_old, pdata_new ) ); // counter is >1 now - safely read int ret_val = read value; do { // before return - decrement counter, use CAS pdata_old = pdata.load(); pdata_new = pdata_old; --pdata_new.counter; } while( CAS( data, pdata_old, pdata_new ) ); return ret_val; }