<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Posts on Miguel Fernandez Arce</title><link>https://muit.xyz/posts/</link><description>Recent content in Posts on Miguel Fernandez Arce</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>&lt;a href="https://github.com/muit" rel="noopener">Miguel Fernandez Arce&lt;/a></copyright><lastBuildDate>Tue, 01 Aug 2023 00:00:00 +0000</lastBuildDate><atom:link href="https://muit.xyz/posts/index.xml" rel="self" type="application/rss+xml"/><item><title>Making your own array</title><link>https://muit.xyz/posts/making-your-own-array/</link><pubDate>Tue, 01 Aug 2023 00:00:00 +0000</pubDate><guid>https://muit.xyz/posts/making-your-own-array/</guid><description>Just use std::vector.
Which is what I did until a some weeks ago, when I decided enough was enough!
It was about time I made an array type for the needs of my library. In this post, I will go through the design decisions taken while doing so: Creating a custom array container.
Why a custom array? Until now, I used a wrapper around std::vector, which was okay… No, really. But:</description><content type="html"><![CDATA[<p>Just use <strong>std::vector</strong>.</p>
<p>Which is what I did until a some weeks ago, when I decided enough was enough!</p>
<p>It was about time I made an array type for the needs of my library. In this post, I will go through the design decisions taken while doing so: Creating a custom array container.</p>
<h2 id="why-a-custom-array">Why a custom array?</h2>
<p>Until now, I used a wrapper around <code>std::vector</code>, which was okay…  No, really. But:</p>
<ul>
<li>It makes solutions to simple problems unnecessarily complex.</li>
<li>Its API almost completely built using iterators.</li>
<li>It has an allocator types on the template</li>
<li>There is no built-in (or easy) way to have inline memory (try with allocators if you want to sacrifice 500 lines of code to the gods and obtain shitty syntax in return).</li>
<li>It has an extensive &amp; rigid API with years of features that I don&rsquo;t want or need to maintain.</li>
<li><code>std::vector&lt;bool&gt;</code>? Really? Burn it.</li>
</ul>
<p>And many others really, but most importantly:</p>
<ul>
<li><strong>It&rsquo;s fun to do your own stuff sometimes</strong>, not going to lie.</li>
</ul>
<p>These points are not necessarily the wrong choice for the standard library considering its scope, but for me, <em>they very much are</em>.</p>
<p>We, humans, should understand how the tools we use work. Otherwise, we could be using them the wrong way or the wrong tool. And containers are a tool like any other.
If you ever read code inside std::vector, no matter which std implementation it was, I wouldn&rsquo;t be surprised if you chose to not stick around.</p>
<p>Std implementations are often unintelligible, in good part, because the design they are built on top of has a long list of requirements that adds up.</p>
<p>Some honorable mentions from the previous points:</p>
<ul>
<li>The iterator based API forces functions to be their own templates, where parameters could be iterators of any type, and many extra checks need to be run. The abstraction layer it adds, over simply using indexes, is not for free either.</li>
<li>Allocators make compatibility across otherwise equivalent vectors a nightmare. They try to solve memory allocation, yet fail to be of real use in real scenarios, and they multiply the number of compiled class variations (which makes compiling slower). Not forgetting, it also guarantees a complex implementation.</li>
</ul>
<h2 id="about-pipes-requirements">About Pipe&rsquo;s Requirements</h2>
<p><strong>






  


<a href="https://github.com/PipeRift/pipe"
   >Pipe</a></strong>, the library that will contain these shiny new arrays, is the foundational library I use on most of my C++ projects. It has many great experimental features that I have repeatedly failed to share with others like they deserve.</p>
<p>I have used this library for more than 9 years, and overcoming the limitations of std::vector was increasingly frustrating. Specially when I needed to scratch extra performance with features like inline memory.</p>
<blockquote>
<p><em>By “<strong>inline memory</strong>” I mean having N items contained directly inside the array&rsquo;s instance</em></p>
</blockquote>
<p>I needed an Array type that:</p>
<ul>
<li>Natively supports inline memory, without sacrificing the syntax or user experience.</li>
<li>Integrates with <em>arenas</em> to control the memory it allocates.</li>
<li>Has a combined index and iterator based API with an extensive list of helpers.</li>
<li>Its implementation MUST be simple.</li>
</ul>
<h2 id="the-design">The Design</h2>
<p>Lets see how we can achieve reasonable simplicity for arrays.</p>
<p>In Pipe, any container with a contiguous list of elements, whether it owns it or not, inherits from <em>IArray</em> (I welcome better name suggestions).</p>
<p>This class is not intended for the user to use directly, but it provides shared functionality for finding, checking, sorting, swapping and iterating the elements in the list.</p>
<p>Two classes use <em>IArray</em> (and some aliases too):</p>
<ul>
<li><strong>View</strong>: Points to one or more contiguous elements that <strong>it does not own</strong>. These elements can be literals, arrays, or any other pointer with a size. Equivalent to <em>std::span</em>, or what is sometimes called an <em>ArrayView</em>.</li>
<li><strong>InlineArray</strong>: Owns a contiguous, mutable list of elements. It can use an optional inline buffer for performance. Because of this, it <em>does not use allocators</em>. Somewhat equivalent to std::vector or other array implementations.</li>
<li><strong>Array</strong>: An alias for <em>InlineArray</em> with an inline buffer size of 0, meaning it uses exclusively allocated memory.</li>
</ul>
<p>There can be other aliases like <strong>SmallArray</strong> that use different combinations of the inline buffer, but the point is that there is a single implementation class for arrays.</p>
<h3 id="allocation">Allocation</h3>
<p>Lets go back to “<em>does not use allocators</em>”:</p>
<p>Over the years, I have seen and used many implementations of arrays. Like everything, they have advantages and disadvantages. It is a balance. However, those that used templated allocators were specifically rigid, verbose or complex (or all those three).</p>
<p>Usually, you want to solve two problems with allocators:</p>
<ul>
<li>Control how and where the container’s memory is allocated.</li>
<li>Inject and use inline elements in the container.
Optionally, you may want to share these allocators with different containers.</li>
</ul>
<p>Sharing allocators sounds ideal, but is very problematic when you also want to achieve the other points. Different containers allocate differently. If an allocator is used in an array, you know it only needs to maintain a single block of memory. However, <em>maps, sets, or page buffers</em> don&rsquo;t work this way, and can allocate many blocks. They have requirements that can be incompatible with each other.</p>
<p>Most allocators also need to know the type the container holds, so they need to be templates. They have a dependency between the memory and the type since many times they are the ones doing the copying of elements, among other operations.</p>
<blockquote>
<p>Okay, but they surely must have many uses… right?</p>
</blockquote>
<p>I think it is pretty rare, would even dare to say extremely rare, to see in your everyday life a container allocator that is <strong>not</strong> for inline memory or for a very specific use.</p>
<p>If we imagine we had an “inline allocator” in different APIs, it could look like:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cpp" data-lang="cpp"><span style="display:flex;"><span>std<span style="color:#f92672">::</span>vector<span style="color:#f92672">&lt;</span>String, InlineAllocator<span style="color:#f92672">&lt;</span>String, <span style="color:#ae81ff">5</span><span style="color:#f92672">&gt;&gt;</span> values; <span style="color:#75715e">// standard library
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>TArray<span style="color:#f92672">&lt;</span>String, InlineAllocator<span style="color:#f92672">&lt;</span><span style="color:#ae81ff">5</span><span style="color:#f92672">&gt;&gt;</span> values; <span style="color:#75715e">// unreal engine
</span></span></span></code></pre></div><p>In Pipe, this would look a bit different:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cpp" data-lang="cpp"><span style="display:flex;"><span>InlineArray<span style="color:#f92672">&lt;</span>String, <span style="color:#ae81ff">5</span><span style="color:#f92672">&gt;</span> values;
</span></span></code></pre></div><p>I choose to split the problem of allocation:</p>
<ul>
<li><strong>Inline memory</strong> is handled by the array itself.</li>
<li><strong>Allocated memory</strong> is handled by arenas.</li>
</ul>
<h4 id="inline-memory">Inline Memory</h4>
<p>It is handled by the array itself.
When we use for example <code>InlineArray&lt;T, 5&gt;</code> the array will be able to hold up to <strong>5</strong> inline elements. If we exceed this capacity, it will use allocation. Similarly, if it fits, it will move to inline from allocation.
Of course, you can assign an inline buffer of size 0, this is actually very common.</p>
<p>The user does not need to remember how to use inline memory since it is always available on the container.</p>
<h4 id="allocated-memory">Allocated Memory</h4>
<p>It is handled exclusively by <strong>arenas</strong>.
Arenas handle allocation following a particular algorithm. They are non-templated, and completely independent of the container itself.</p>
<blockquote>
<p><strong>To give you an example</strong>: I use them for reflection, where a single linear arena is assigned to all containers allocating reflection data. This means reflection has great data locality and reduces cache misses. It makes operations like checking inheritance much faster, since we access memory that is very close.</p>
</blockquote>
<p>An array can be assigned an arena during its construction.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cpp" data-lang="cpp"><span style="display:flex;"><span>MultiLinearArena arena;
</span></span><span style="display:flex;"><span>Array<span style="color:#f92672">&lt;</span>String<span style="color:#f92672">&gt;</span> names{arena};
</span></span></code></pre></div><p>When no arena is provided, a global or scope arena is used. <em>I should write another post about arenas…</em></p>
<p>With this design, an array can copy or move to another array with a different arena seamlessly, just like it does with the inline buffer. No extra code is needed to achieve arenas or inline, and if we need to control allocation, we can use an arena of our choice.</p>
<h3 id="indexes">Indexes</h3>
<p>On the topic of indexes, there is not that much to mention.</p>
<p>Simply put, most of the functions in the array prefer indexes or counts over iterators.
This makes their use and implementation easier.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cpp" data-lang="cpp"><span style="display:flex;"><span><span style="color:#66d9ef">void</span> <span style="color:#a6e22e">Insert</span>(i32 atIndex, <span style="color:#66d9ef">const</span> Type<span style="color:#f92672">&amp;</span> value);
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">bool</span> <span style="color:#a6e22e">RemoveAt</span>(i32 index, <span style="color:#66d9ef">const</span> <span style="color:#66d9ef">bool</span> shouldShrink <span style="color:#f92672">=</span> true);
</span></span><span style="display:flex;"><span>Type<span style="color:#f92672">*</span> <span style="color:#a6e22e">At</span>(i32 index) <span style="color:#66d9ef">const</span>;
</span></span></code></pre></div><p>Of course, iterators are still supported to allow range-for or iterator algorithms, but the API prefers the use of indexes for simplicity.</p>
<h3 id="unsafe">Unsafe</h3>
<p>Sometimes when we work with arrays, we might know the inputs we provide are safe. For that reason, many functions in Pipe have an <strong>unsafe</strong> version which skips some safety checks. Use them at your own risk.</p>
<p>This can help gain back some performance in the large scale of things.</p>
<p>Very often, the safe versions simply call the unsafe version after running those checks:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cpp" data-lang="cpp"><span style="display:flex;"><span><span style="color:#66d9ef">bool</span> <span style="color:#a6e22e">RemoveAt</span>(i32 index, <span style="color:#66d9ef">const</span> <span style="color:#66d9ef">bool</span> shouldShrink)
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">if</span> (IsValidIndex(index))
</span></span><span style="display:flex;"><span>	{
</span></span><span style="display:flex;"><span>		RemoveAtUnsafe(index, shouldShrink);
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">return</span> true;
</span></span><span style="display:flex;"><span>	}
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">return</span> false;
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">void</span> <span style="color:#a6e22e">Swap</span>(i32 firstIdx, i32 secondIdx)
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">if</span> (IsValidIndex(firstIdx) <span style="color:#f92672">&amp;&amp;</span> IsValidIndex(secondIdx) <span style="color:#f92672">&amp;&amp;</span> firstIdx <span style="color:#f92672">!=</span> secondIdx)
</span></span><span style="display:flex;"><span>	{
</span></span><span style="display:flex;"><span>		SwapUnsafe(firstIdx, secondIdx);
</span></span><span style="display:flex;"><span>	}
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Their API will always contain “Unsafe” at the end. This makes it likely that safe versions show up first while coding, and continuously gives a hint of their risk to the user.</p>
<h3 id="plurals">Plurals</h3>
<p>It is always better to do an operation once for N items, than N times for N items.
This is why, in this design, many operations the array does (like adding or removing) can be performed in bulk.</p>
<p>You can add, remove, swap or sort many at once. This can provide a substantial performance benefit, while also simplifying the user code.</p>
<p>This can be done by providing another span or array to the function, or a range of indexes or iterators:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cpp" data-lang="cpp"><span style="display:flex;"><span><span style="color:#75715e">// Some examples of bulk operations in InlineArray:
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">void</span> <span style="color:#a6e22e">Append</span>(<span style="color:#66d9ef">const</span> IArray<span style="color:#f92672">&lt;</span>Type<span style="color:#f92672">&gt;&amp;</span> values);
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">void</span> <span style="color:#a6e22e">Assign</span>(<span style="color:#66d9ef">const</span> IArray<span style="color:#f92672">&lt;</span>Type<span style="color:#f92672">&gt;&amp;</span> values);
</span></span><span style="display:flex;"><span>i32 <span style="color:#a6e22e">Remove</span>(<span style="color:#66d9ef">const</span> IArray<span style="color:#f92672">&lt;</span><span style="color:#66d9ef">const</span> Type<span style="color:#f92672">&gt;&amp;</span> items, <span style="color:#66d9ef">bool</span> shouldShrink <span style="color:#f92672">=</span> true);
</span></span><span style="display:flex;"><span>i32 <span style="color:#a6e22e">Remove</span>(<span style="color:#66d9ef">const</span> IArray<span style="color:#f92672">&lt;</span>i32<span style="color:#f92672">&gt;&amp;</span> indices, <span style="color:#66d9ef">bool</span> shouldShrink <span style="color:#f92672">=</span> true);
</span></span></code></pre></div><h2 id="final-notes">Final Notes</h2>
<p>For anyone interested in taking a look at the full implementation, you can find it <strong>






  


<a href="https://github.com/PipeRift/pipe/blob/feature/custom-arrays/Include/Pipe/PipeArrays.h"
   >here (PipeArrays.h)</a></strong> along with the library (<strong>






  


<a href="https://github.com/PipeRift/pipe"
   >Pipe</a></strong>).</p>
<p>I am sure I also forgot important details or didn&rsquo;t explain something correctly, so feel free to leave a comment and feedback, and if you happened to like it, let me know! I don&rsquo;t write often, but your encouragement will help :)</p>
<p>Finally, I am aware that topics like this have such a wide amount of uses that my described solution (that works for <em>my needs</em>) will be as good for some as it will be bad for others. Let&rsquo;s keep it a constructive conversation anyway.</p>
<p>Until next time, Muit.</p>
]]></content></item><item><title>A new approach to ECS APIs</title><link>https://muit.xyz/posts/ecs-new-approach-to-ecs-apis/</link><pubDate>Thu, 10 Feb 2022 00:00:00 +0000</pubDate><guid>https://muit.xyz/posts/ecs-new-approach-to-ecs-apis/</guid><description>Let’s talk about a different approach to ECS I have been rumbling about lately. Well, specifically, about how we query entities, manage dependencies and access/modify data.
What is ECS you ask? Fair question! ECS (as Entity-Component-System) is an architectural pattern based on DOD (data-oriented design), where you have three main elements:
Entities: They are just an identifier and don’t hold any data. Components: Structs of data associated with a single entity (1 entity can have 1 component of each type).</description><content type="html"><![CDATA[<p>Let’s talk about a different approach to ECS I have been rumbling about lately. Well, specifically, about how we query entities, manage dependencies and access/modify data.</p>
<h1 id="what-is-ecs-you-ask">What is ECS you ask?</h1>
<p>Fair question! <strong>ECS</strong> (as Entity-Component-System) is an architectural pattern based on DOD (data-oriented design), where you have three main elements:</p>
<ul>
<li><strong>Entities</strong>: They are just an identifier and don’t hold any data.</li>
<li><strong>Components</strong>: Structs of data associated with a single entity (1 entity can have 1 component of each type). They don’t have any code/logic.</li>
<li><strong>Systems</strong>: Functions executed operating entities and components.</li>
</ul>
<p>I could explain ECS in greater detail, but there are plenty of resources online already that will do a better job than me. 






  


<a href="https://www.youtube.com/watch?v=0_Byw9UMn9g"
   >This talk</a> is a good start, and for more resources, you can also 






  


<a href="https://github.com/SanderMertens/ecs-faq"
   >read this</a>.</p>
<p>I personally also like to consider <strong>Utilities</strong> as the forth secret child of ECS.
Utilities are functions that can be reused between systems. Any code that is not part of a system is a utility. One example could be <em>hierarchy</em> where we can <em>add, remove, or transfer children from entities</em> from multiple systems.</p>
<h1 id="current-approach-to-ecs-apis">Current approach to ECS APIs</h1>
<p>Now that we know what ECS is and the basics of how it works, let&rsquo;s talk about how we could improve it.</p>
<p>In most ECS libraries I have used so far, there is always the concept of a <strong>view</strong>, or a <strong>filter</strong>.
This is a tool that allows fast iteration of entities following a set of conditions. You can say, for example, &ldquo;iterate all entities with &lsquo;Player&rsquo; and &lsquo;Movement&rsquo; components, but ignore those with &lsquo;Frozen&rsquo; component&rdquo;.</p>
<p>Implementation details may differ, but I will be using using the popular library <strong>entt</strong> as an example (it&rsquo;s great, check it out). In this library, a “view” caches pools from the world when it is created, and uses them to check for entities matching some included and excluded components.</p>
<h3 id="problems-sharing-code">Problems sharing code</h3>
<p>So lets make an example with <strong>entt</strong> where we move agents (a system):</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cpp" data-lang="cpp"><span style="display:flex;"><span><span style="color:#66d9ef">void</span> <span style="color:#a6e22e">MoveAgents</span>(entt<span style="color:#f92672">::</span>registry<span style="color:#f92672">&amp;</span> registry, <span style="color:#66d9ef">float</span> deltaTime)
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>	<span style="color:#75715e">// We create a view matching all agents with movement and transform components
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>	<span style="color:#66d9ef">auto</span> view <span style="color:#f92672">=</span> registry.view<span style="color:#f92672">&lt;</span><span style="color:#66d9ef">const</span> Agent, <span style="color:#66d9ef">const</span> Movement, Transform<span style="color:#f92672">&gt;</span>();
</span></span><span style="display:flex;"><span>	<span style="color:#75715e">// We iterate all entities in the view
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>	<span style="color:#66d9ef">for</span>(Id entity : view)
</span></span><span style="display:flex;"><span>	{
</span></span><span style="display:flex;"><span>		<span style="color:#75715e">// We get components and apply position based on velocity
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>		<span style="color:#66d9ef">const</span> <span style="color:#66d9ef">auto</span><span style="color:#f92672">&amp;</span> movement <span style="color:#f92672">=</span> view.get<span style="color:#f92672">&lt;</span><span style="color:#66d9ef">const</span> Movement<span style="color:#f92672">&gt;</span>(entity);
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">auto</span><span style="color:#f92672">&amp;</span> transform <span style="color:#f92672">=</span> view.get<span style="color:#f92672">&lt;</span>Transform<span style="color:#f92672">&gt;</span>(entity);
</span></span><span style="display:flex;"><span>		transform.position <span style="color:#f92672">+=</span> movement.velocity <span style="color:#f92672">*</span> deltaTime;
</span></span><span style="display:flex;"><span>	}
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Okay, so far, we are just fine.
But what if we have props that can move? But only when they are enabled.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cpp" data-lang="cpp"><span style="display:flex;"><span><span style="color:#66d9ef">void</span> <span style="color:#a6e22e">MoveProps</span>(entt<span style="color:#f92672">::</span>registry<span style="color:#f92672">&amp;</span> registry, <span style="color:#66d9ef">float</span> deltaTime)
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">auto</span> view <span style="color:#f92672">=</span> registry.view<span style="color:#f92672">&lt;</span><span style="color:#66d9ef">const</span> Prop, <span style="color:#66d9ef">const</span> Movement, Transform<span style="color:#f92672">&gt;</span>();
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">for</span>(Id entity : view)
</span></span><span style="display:flex;"><span>	{
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">const</span> <span style="color:#66d9ef">auto</span><span style="color:#f92672">&amp;</span> prop <span style="color:#f92672">=</span> view.get<span style="color:#f92672">&lt;</span><span style="color:#66d9ef">const</span> Prop<span style="color:#f92672">&gt;</span>(entity);
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">if</span> (prop.isEnabled)
</span></span><span style="display:flex;"><span>		{
</span></span><span style="display:flex;"><span>			<span style="color:#75715e">// Can we reuse this?
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>			<span style="color:#66d9ef">const</span> <span style="color:#66d9ef">auto</span><span style="color:#f92672">&amp;</span> movement <span style="color:#f92672">=</span> view.get<span style="color:#f92672">&lt;</span><span style="color:#66d9ef">const</span> Movement<span style="color:#f92672">&gt;</span>(entity);
</span></span><span style="display:flex;"><span>			<span style="color:#66d9ef">auto</span><span style="color:#f92672">&amp;</span> transform <span style="color:#f92672">=</span> view.get<span style="color:#f92672">&lt;</span>Transform<span style="color:#f92672">&gt;</span>(entity);
</span></span><span style="display:flex;"><span>			transform.position <span style="color:#f92672">+=</span> movement.velocity <span style="color:#f92672">*</span> deltaTime;
</span></span><span style="display:flex;"><span>		}
</span></span><span style="display:flex;"><span>	}
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Well, we get some duplicated code, we could export this into a utility. But how?</p>
<p>If we wanted to share code as utilities, we would be extremely limited, specially if we want to track which data we are reading and writing, which is crucial for scheduling (more on that later).</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cpp" data-lang="cpp"><span style="display:flex;"><span><span style="color:#75715e">// We could use references, but it&#39;s not very practical since we need to get the components outside anyway
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">void</span> <span style="color:#a6e22e">ApplyMovement</span>(<span style="color:#66d9ef">const</span> Movement<span style="color:#f92672">&amp;</span> movement, Transform<span style="color:#f92672">&amp;</span> transform, <span style="color:#66d9ef">float</span> deltaTime)
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>	transform.position <span style="color:#f92672">+=</span> movement.velocity <span style="color:#f92672">*</span> deltaTime;
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e">// We could pass the registry, but then we lose the fast access to pools from views.
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">// Also, we do not know from outside which components we are reading and writing
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">void</span> <span style="color:#a6e22e">ApplyMovement</span>(entt<span style="color:#f92672">::</span>registry<span style="color:#f92672">&amp;</span> registry, <span style="color:#66d9ef">float</span> deltaTime)
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">const</span> <span style="color:#66d9ef">auto</span><span style="color:#f92672">&amp;</span> movement <span style="color:#f92672">=</span> registry.get<span style="color:#f92672">&lt;</span><span style="color:#66d9ef">const</span> Movement<span style="color:#f92672">&gt;</span>(entity); <span style="color:#75715e">// Accessing component directly through world is slow
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>	<span style="color:#66d9ef">auto</span><span style="color:#f92672">&amp;</span> transform <span style="color:#f92672">=</span> registry.get<span style="color:#f92672">&lt;</span>Transform<span style="color:#f92672">&gt;</span>(entity);
</span></span><span style="display:flex;"><span>	transform.position <span style="color:#f92672">+=</span> movement.velocity <span style="color:#f92672">*</span> deltaTime;
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e">// We could pass the view as a template parameter.
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">// But templates need to be declared where they are used, meaning all shared functions will need to be most likely on a header.
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">// To that, you add different views for the same function, and you get slower compile times.
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">// Outside of templates, Views also are not intended to control access, and they can not do all the things you can do with the world.
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">template</span><span style="color:#f92672">&lt;</span><span style="color:#66d9ef">typename</span> View<span style="color:#f92672">&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">void</span> ApplyMovement(View view, <span style="color:#66d9ef">float</span> deltaTime)
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">const</span> <span style="color:#66d9ef">auto</span><span style="color:#f92672">&amp;</span> movement <span style="color:#f92672">=</span> view.get<span style="color:#f92672">&lt;</span><span style="color:#66d9ef">const</span> Movement<span style="color:#f92672">&gt;</span>(entity);
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">auto</span><span style="color:#f92672">&amp;</span> transform <span style="color:#f92672">=</span> view.get<span style="color:#f92672">&lt;</span>Transform<span style="color:#f92672">&gt;</span>(entity);
</span></span><span style="display:flex;"><span>	transform.position <span style="color:#f92672">+=</span> movement.velocity <span style="color:#f92672">*</span> deltaTime;
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Along with the problems sharing code (utilities) between systems, you will also have a really hard time tracking dependencies as your project grows if you want to do any sort of scheduling.</p>
<h3 id="problems-scheduling">Problems scheduling</h3>
<p>As I mentioned in the previous step, scheduling is a huge problem, and we should simplify it.</p>
<p>Scheduling helps us organize hundreds of system functions to execute safely in multithreading. To achieve that, we need to know where we read and modify components:</p>
<ul>
<li>We can safely <strong>read</strong> components of the same type from many threads at the same time.</li>
<li>We can&rsquo;t safely <strong>read</strong> components of the same type <strong>while</strong> any other thread is <strong>writing</strong> them.</li>
</ul>
<p>We can, of course, schedule by hand, but this quickly becomes unmaintainable. That&rsquo;s why there are many ways to automate it. But, as I said, you need to be able to know what you are doing inside a function from outside, or this won&rsquo;t be possible.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cpp" data-lang="cpp"><span style="display:flex;"><span><span style="color:#75715e">// If we pass around the registry, we don&#39;t know our dependencies
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">// We don&#39;t know which components this function is accessing and modifying
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">void</span> <span style="color:#a6e22e">MoveProps</span>(entt<span style="color:#f92672">::</span>registry<span style="color:#f92672">&amp;</span> registry, <span style="color:#66d9ef">float</span> deltaTime) {}
</span></span></code></pre></div><h3 id="problems-controlling-data-flow">Problems controlling data-flow</h3>
<p>One of the points of DOD is that all code serves a single purpose: It converts data (input) into other data (output). “It&rsquo;s all about the data.”</p>
<p>Having a view that we mostly only iterate is limiting us if we want to do proper algorithms where we use multiple steps to (efficiently) operate data.</p>
<h1 id="fixing-the-problems">Fixing the problems</h1>
<p>Lets see what we need:</p>
<ul>
<li>We need to be able to <strong>easily</strong> share code</li>
<li>We need to express dependencies when reading and writing components, allowing us to schedule</li>
<li>We need to be able to apply complex data flows, allowing more cache and cpu friendly code</li>
<li>It has to be blazing fast</li>
<li>Errors must be <strong>simple</strong> and straight forward <em>&hellip;proceeds to look at templates with disapproval</em></li>
</ul>
<p>I experimented with a solution to this for a while and ended up implementing it in 






  


<a href="https://github.com/PipeRift/rift"
   ><strong>Rift</strong></a>.
This solution I came up with solves all the points above, so let&rsquo;s have a look rebuilding the previous examples with it:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cpp" data-lang="cpp"><span style="display:flex;"><span><span style="color:#75715e">// We pass an Access with the types we can write, and those we can only read (const)
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">void</span> <span style="color:#a6e22e">MoveProps</span>(TAccess<span style="color:#f92672">&lt;</span><span style="color:#66d9ef">const</span> Prop, <span style="color:#66d9ef">const</span> Movement, Transform<span style="color:#f92672">&gt;</span> access, <span style="color:#66d9ef">float</span> deltaTime)
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">for</span>(Id entity : ListAll<span style="color:#f92672">&lt;</span>Prop, Movement, Transform<span style="color:#f92672">&gt;</span>(access))
</span></span><span style="display:flex;"><span>	{
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">const</span> <span style="color:#66d9ef">auto</span><span style="color:#f92672">&amp;</span> prop <span style="color:#f92672">=</span> access.Get<span style="color:#f92672">&lt;</span><span style="color:#66d9ef">const</span> Prop<span style="color:#f92672">&gt;</span>(entity);
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">if</span> (prop.isEnabled)
</span></span><span style="display:flex;"><span>		{
</span></span><span style="display:flex;"><span>			<span style="color:#75715e">// Can we reuse this? Yes
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>			<span style="color:#75715e">//const auto&amp; movement = access.Get&lt;const Movement&gt;(entity);
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>			<span style="color:#75715e">//auto&amp; transform = access.Get&lt;Transform&gt;(entity);
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>			<span style="color:#75715e">//transform.position += movement.velocity * deltaTime;
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>
</span></span><span style="display:flex;"><span>			<span style="color:#75715e">// So, lets reuse it
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>			ApplyMovement(access, entity, deltaTime);
</span></span><span style="display:flex;"><span>		}
</span></span><span style="display:flex;"><span>	}
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e">// The parent access (MoveProps) must have these components.
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">// If it doesn&#39;t, we will get proper errors telling us what&#39;s missing.
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">void</span> <span style="color:#a6e22e">ApplyMovement</span>(TAccess<span style="color:#f92672">&lt;</span><span style="color:#66d9ef">const</span> Movement, Transform<span style="color:#f92672">&gt;</span> access, Id entity, <span style="color:#66d9ef">float</span> deltaTime)
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">const</span> <span style="color:#66d9ef">auto</span><span style="color:#f92672">&amp;</span> movement <span style="color:#f92672">=</span> access.Get<span style="color:#f92672">&lt;</span><span style="color:#66d9ef">const</span> Movement<span style="color:#f92672">&gt;</span>(entity);
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">auto</span><span style="color:#f92672">&amp;</span> transform <span style="color:#f92672">=</span> access.Get<span style="color:#f92672">&lt;</span>Transform<span style="color:#f92672">&gt;</span>(entity);
</span></span><span style="display:flex;"><span>	transform.position <span style="color:#f92672">+=</span> movement.velocity <span style="color:#f92672">*</span> deltaTime;
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><h3 id="access">Access</h3>
<p>A access represents a set of components for efficient access and dependency tracking. It also can’t be directly iterated (by design). We have other tools for that.</p>
<ul>
<li>It is very cheap to copy (only a pool pointer copy for each component type)</li>
<li>It provides instant access into component pools</li>
<li>Extremely simpler and less template-heavy than views</li>
<li>Can be constructed implicitly from the ECS world or other bigger accesses.</li>
</ul>
<p>Access can have two flavors. Compile-time assisted <code>TAccess&lt;Types&gt;</code> or runtime based <code>Access</code></p>
<p>It also makes sense to pass them as const reference to functions. They are cheap to copy yes, but we might not need to do it at all. That&rsquo;s why I added an alias <code>TAccessRef&lt;Types&gt;</code> which is essentially the same as <code>const TAccess&lt;Types&gt;&amp;</code>. It&rsquo;s just easier to write.</p>
<h3 id="filtering-entities">Filtering entities</h3>
<p>If a access can&rsquo;t iterate on its own, how do we do it?</p>
<p>Iteration is done by creating and modifying lists of ids:</p>
<ul>
<li><code>ListAll&lt;Types&gt;(access)</code>: Returns all entity ids containing all the provided components.</li>
<li><code>ListAny&lt;Types&gt;(access)</code>: Returns all entity ids containing at least one of the provided components</li>
</ul>
<p>Then we can also apply new filters like excluding components:</p>
<ul>
<li><code>RemoveIf&lt;Types&gt;(access, ids)</code>: Exclude entities not having a component</li>
<li><code>RemoveIfNot&lt;Types&gt;(access, ids)</code>: Exclude entities having a component</li>
</ul>
<p>It should be mentioned that these functions don&rsquo;t ensure the order is kept by default (for performance), but we can use their counterparts for that:</p>
<ul>
<li><code>RemoveIfStable&lt;Types&gt;(access, ids)</code>: Exclude entities not having a component</li>
<li><code>RemoveIfNotStable&lt;Types&gt;(access, ids)</code>: Exclude entities having a component</li>
</ul>
<p>The potential of this is that we are just operating a list of indexes, and we are not limited by the functions above on what we can do. Its just &ldquo;filtering&rdquo; lists of ids.</p>
<p>One example could be in 






  


<a href="https://github.com/PipeRift/rift"
   >Rift</a>, where the compiler precaches two lists, one for classes and one for structs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cpp" data-lang="cpp"><span style="display:flex;"><span>TArray<span style="color:#f92672">&lt;</span>AST<span style="color:#f92672">::</span>Id<span style="color:#f92672">&gt;</span> classes, structs;
</span></span><span style="display:flex;"><span>AST<span style="color:#f92672">::</span>Hierarchy<span style="color:#f92672">::</span>GetChildren(ast, moduleId, classes);
</span></span><span style="display:flex;"><span>AST<span style="color:#f92672">::</span>RemoveIfNot<span style="color:#f92672">&lt;</span>CType<span style="color:#f92672">&gt;</span>(ast, classes);
</span></span><span style="display:flex;"><span>structs <span style="color:#f92672">=</span> classes;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>AST<span style="color:#f92672">::</span>RemoveIfNot<span style="color:#f92672">&lt;</span>CClassDecl<span style="color:#f92672">&gt;</span>(ast, classes);
</span></span><span style="display:flex;"><span>AST<span style="color:#f92672">::</span>RemoveIfNot<span style="color:#f92672">&lt;</span>CStructDecl<span style="color:#f92672">&gt;</span>(ast, structs);
</span></span></code></pre></div><p>As you can see, it is filtering different components to finish with those two lists of types.</p>
<p>It also shows how filtering can also be done directly from the world (ast in the example) without an access. You wont get the benefit of cached pools, but it will still be really fast to iterate: <br>
<code>ListAll&lt;Types&gt;(world)</code> <code>RemoveIf&lt;Types&gt;(world, ids)</code> <code>RemoveIfNot&lt;Types&gt;(world, ids)</code></p>
<h2 id="performance">Performance</h2>
<p>I mentioned many reasons why this style of API is attractive, but there is another one. It is fast.</p>
<p>When I implemented accesses for 






  


<a href="https://github.com/PipeRift/rift"
   ><strong>Rift</strong></a>, I already had filters (very similar to entt’s views). So I took the chance to do a one to one comparison with the following results:</p>
<p>In <strong>debug</strong> access filtering gets up to 3 times faster iterating than views.</p>
<p>



<img src="/Assets/Img/ecs-access-debug.png"
  
   alt="Access in Debug" 
/></p>
<p>While in <strong>release</strong> the difference is tighter, between 35% to 50% faster in most runs.</p>
<p>



<img src="/Assets/Img/ecs-access-release.png"
  
   alt="Access in Release" 
/></p>
<p>Should be noted that this benchmark runs an empty iteration loop. For views, this means their pool checks are very close in execution. In other words, it is their <strong>ideal scenario</strong>. It is unrealistically in their favor. However, they seem to run slower. Why is that?</p>
<h3 id="why-is-it-faster">Why is it faster?</h3>
<p>Unlike views, accesses don&rsquo;t need to find their pools, again and again, every time they get created. Most of the time, a access is created from another, which is literally just copying the relevant pool pointers.</p>
<p>However this is not where most of the performance benefit comes from.</p>
<p>It comes from the fact that, while in views, each entity is checked at once against all the pools to filter, with <strong>ListAll</strong> all ids are checked pool after pool:</p>
<p><strong>Views</strong></p>
<ul>
<li>Iterate all ids in smallest pool
<ul>
<li>Check that the id has components A, B, C</li>
</ul>
</li>
</ul>
<p><strong>Access Filtering</strong></p>
<ul>
<li>Get all ids from smallest pool</li>
<li>Remove those that don&rsquo;t have component A</li>
<li>Remove those that don&rsquo;t have component B</li>
<li>Remove those that don&rsquo;t have component C</li>
</ul>
<p>This uses a single pool and its hash-set at a time, making it more cache-friendly.</p>
<br>
<p>I hope this post was not too dense. It is quite a specific topic, after all.</p>
<p>Consider having a look at 






  


<a href="https://github.com/PipeRift/rift"
   ><strong>Rift</strong></a>. It would be incredibly helpful to get your ideas, feedback and/or code contributions!</p>
]]></content></item><item><title>Implementing a general-use arena</title><link>https://muit.xyz/posts/memory-implementing-a-general-arena/</link><pubDate>Thu, 03 Feb 2022 00:00:00 +0000</pubDate><guid>https://muit.xyz/posts/memory-implementing-a-general-arena/</guid><description>Now that we have learned about arenas and allocators, we can get our hands dirty with an implementation of an arena.
Best Fit Arena You see, for the last couple of months, I&amp;rsquo;ve been updating RiftCore with new features. RiftCore is a cross-platform framework I use for C++ projects, and it lacked some memory management.
So the time came to design a general-purpose arena!
This article will describe the design and implementation of a Best Fit Arena.</description><content type="html"><![CDATA[<p>Now that we have learned about 






  


<a href="/posts/memory-introduction-to-allocators-and-arenas/"
   >arenas and allocators</a>, we can get our hands dirty with an implementation of an arena.</p>
<h2 id="best-fit-arena">Best Fit Arena</h2>
<p>You see, for the last couple of months, I&rsquo;ve been updating 






  


<a href="https://github.com/PipeRift/rift-core"
   >RiftCore</a> with new features.
<strong>RiftCore</strong> is a cross-platform framework I use for C++ projects, and it lacked some memory management.</p>
<p>So the time came to design a general-purpose arena!</p>
<p>This article will describe the design and implementation of a <strong>Best Fit Arena</strong>.
Feel free to come up with a better name though (and put it in the comments below!)</p>
<h2 id="general-purpose">General-purpose?</h2>
<p>A general-purpose allocator (or arena) must be able to work on all scenarios with out any big limitation.
As such, it has to be able to:</p>
<ul>
<li><strong>Allocate</strong> in any order and any size</li>
<li><strong>Deallocate</strong> in any order</li>
<li>Use (and reuse) all space available</li>
<li>Minimize fragmentation</li>
</ul>
<p>In RiftCore, Arenas always carry the <code>size</code> of the pointer in their <code>Free()</code> function.
This opens the door to some optimizations, but, don&rsquo;t worry, the BestFitArena can be adapted to avoid this pattern.</p>
<h2 id="implementation">Implementation</h2>
<p>A <strong>BestFitArena</strong> works by <strong>tracking all unused spaces</strong>, called free slots.




<img src="/Assets/Img/best-fit-arena-slot-ids.png"
  
   alt="BestFitArena" 
/></p>
<p>Let&rsquo;s go through what we see in this picture:</p>
<ul>
<li>Like most allocators, we have one or multiple memory blocks of pre-allocated memory.</li>
<li>We also keep a list of <code>FreeSlots</code>, sorted by size. Bigger first.</li>
<li>We don&rsquo;t track allocations in any way. No headers, no offsets and no sizes.</li>
</ul>
<p>



<img src="/Assets/Img/best-fit-arena-slot-ptrs.png"
  
   alt="BestFitArena Slot Pointers" 
/>
Seen in more detail, each slot points to the start of its memory and its size.</p>
<p>This algorithm has <strong>zero overhead</strong> when fragmentation is low. The less fragmentation, the more performant it is.
However, it is also designed to minimize it, and, as you will see later, even in an scenario with a lot of fragmentation, performance is still excellent.</p>
<h3 id="allocation">Allocation</h3>
<p><strong>Allocation</strong> will always pick the smallest free slot possible and extract the pointer from it.
Then, this slot is reduced removing the used space from it.




<img src="/Assets/Img/best-fit-arena-allocation.png"
  
   alt="BestFitArena Allocate" 
/></p>
<h4 id="find-smallest-slot">Find Smallest Slot</h4>
<p>Before anything else, we check if the arena is marked as pending sort.
This is an optimization that prevents unnecessary sorts on consequent Free calls.
But we also perform shrink on the slots if necessary.</p>
<p>Once we know all slots are sorted, we perform a 






  


<a href="https://www.geeksforgeeks.org/binary-search/"
   >binary search</a> by size.
The binary search will provide a complexity of O(logN).</p>
<h3 id="free">Free</h3>
<p><strong>Free</strong> expands the free slots that &ldquo;touch&rdquo; the freed memory, absorb it and growing the slot.</p>
<p>We know of the size of the allocation because it is contained on the free slots list which we check anyway.




<img src="/Assets/Img/best-fit-arena-free.png"
  
   alt="BestFitArena Free" 
/></p>
<br>
<p><em>PS</em>: This is a post I never published when I wrote it. So some details might be missing but feel free to ask any questions :)</p>
]]></content></item><item><title>Introduction to allocators and arenas</title><link>https://muit.xyz/posts/memory-introduction-to-allocators-and-arenas/</link><pubDate>Tue, 30 Mar 2021 00:00:00 +0000</pubDate><guid>https://muit.xyz/posts/memory-introduction-to-allocators-and-arenas/</guid><description>Lately, I have been playing around with the implementation of custom allocators and arenas to replace native allocations on my C++ projects.
Wow! Stop right there, Miguel. This line already deserves some introductions! Let&amp;rsquo;s talk about allocators.
Crash course on allocations To keep this brief, I will assume that we have some experience with C++ and heap allocation (malloc and new).
An allocation is when we request a pointer to a block of memory of a specified size.</description><content type="html"><![CDATA[<p>Lately, I have been playing around with the implementation of custom allocators and arenas to replace native allocations on my C++ projects.</p>
<p>Wow! Stop right there, Miguel. This line already deserves some introductions!
Let&rsquo;s talk about allocators.</p>
<h2 id="crash-course-on-allocations">Crash course on allocations</h2>
<p>To keep this brief, I will assume that we have some experience with C++ and heap allocation (<code>malloc</code> and <code>new</code>).</p>
<p>An <strong>allocation</strong> is when we request a pointer to a block of memory of a specified size.
When we use <code>malloc</code> or <code>new</code> we are getting this block of memory from the heap.</p>
<p>When we <strong>deallocate</strong> a pointer (calling <code>free</code> or <code>delete</code>) its block of memory becomes once again available and no longer needed by us.</p>
<h2 id="what-are-allocators-and-arenas">What are Allocators and Arenas</h2>
<p>The definition of an allocator is somewhat flexible. It involves the encapsulation of allocation and deallocation of memory.</p>
<p>The allocators provided by the STD (the C++ standard library) are templated objects bound to a type.
For example, <code>std::vector</code> can have different allocators.</p>
<p>In game development, we also use allocators as global memory managers.
Using them, we can optimize allocations for specific parts of a game engine.
For example, we can have an allocator that contains one render frame of data and gets cleared when a new frame starts.</p>
<p>But&hellip; Isn&rsquo;t it confusing to call everything an allocator?
I believe it is, and I don&rsquo;t seem to be the only one because some engines call the global memory allocators <em>arenas</em>.</p>
<p>Therefore, let&rsquo;s stick with the following terminology:</p>
<blockquote>
<p><strong>Allocators</strong> are objects that encapsulate allocation and deallocation of memory</p>
</blockquote>
<blockquote>
<p><strong>Arenas</strong> are independent (often global) allocators</p>
</blockquote>
<blockquote>
<p><strong>Container Allocators</strong> are stateful allocators that manage the memory used by a container</p>
</blockquote>
<h2 id="why-are-they-necessary">Why are they necessary?</h2>
<p>Native allocation needs to work in all scenarios.
It behaves like a general-purpose arena, meaning it can&rsquo;t have limitations, and it must be good enough at doing everything.
All this, while lacking any context about our particular use case.</p>
<p>Knowing this, I can think of three performance benefits from allocators:
<strong>Allocation/free cost</strong>, <strong>memory locality</strong> and <strong>fragmentation</strong>.</p>
<blockquote>
<p><strong>Allocation/Free Cost</strong></p>
<p><code>malloc</code> acts as the intermediary between the program and the OS.
For example, sometimes it will need to request more memory from the Kernel, and that is very slow</p>
</blockquote>
<blockquote>
<p><strong>Memory Locality</strong></p>
<p>Very briefly speaking, modern CPUs have cache-lines, caches and RAM.
Since data is retrieved in blocks into the caches, if the data we need is cohesive, it&rsquo;s much more likely that it will be already cached.
Accessing RAM instead of CPU cache can be hundreds of times slower.</p>
<p>Since <code>malloc</code> and <code>new</code> don&rsquo;t have context about our memory use-cases, the pointers allocated can be anywhere.
However, allocators can give us much better memory locality.</p>
</blockquote>
<blockquote>
<p><strong>Fragmentation</strong>




<img src="/Assets/Img/fragmentation.png"
  
   alt="Fragmentation" 
/>
Fragmentation occurs when we have allocated and freed multiple times leaving gaps that are not big enough to fit new allocations.</p>
<p>This means we will need to request more memory. Some allocator algorithms don&rsquo;t have fragmentation at all. Others have the information to reduce it further than <code>malloc</code> can.</p>
</blockquote>
<p>From a technical design standpoint, we will also simplify code, <em>visualizing</em> where memory is held at all times and under which rules.
We can use the arena that fits our problem and change it if needed.</p>
<h2 id="types-of-allocators">Types of Allocators</h2>
<p>There are many types of allocators based on their algorithms.
Each of them brings benefits as well as limitations.</p>
<p>There is no way I could explain all of them, but let me give you a quick rundown of the simplest ones.</p>
<h3 id="linear">Linear</h3>
<p>



<img src="/Assets/Img/linear-allocator.png"
  
   alt="Linear Allocator" 
/></p>
<p>A <strong>Linear allocator</strong> reserves a big block of memory and then moves an offset to the next available position when allocating.
Since it doesn&rsquo;t keep track of previous allocations, a linear allocator <strong>can&rsquo;t be freed</strong>.</p>
<p>This algorithm is by far the most performant due to its simplicity.
But it also has the most limitations, so its use in the real world is very specific.</p>
<h3 id="stack">Stack</h3>
<p>



<img src="/Assets/Img/stack-allocator.png"
  
   alt="Stack Allocator" 
/>
<strong>Stack</strong> is one step more advanced than Linear. It knows the size of all allocations, allowing us to free the <strong>last</strong> allocation.</p>
<h3 id="pool">Pool</h3>
<p>



<img src="/Assets/Img/pool-allocator.png"
  
   alt="Pool Allocator" 
/></p>
<p>A <strong>Pool</strong> <em>allocator</em> contains a list of same size slots. All allocations must be smaller than one slot.</p>
<p>To track which slots are available, we can use a bitset.
They are very performant and compact containers where 1 bit represents one occupied slot.</p>
<p>Some implementations keep track of allocations using a linked list.
However, this means we need to iterate over the entire memory block. It also introduces 8 extra bytes for each allocation.</p>
<h3 id="general">General</h3>
<p>A <strong>general</strong> allocator can be used for all use-cases and doesn&rsquo;t have any big limitation.
I will soon publish how I implemented a general arena that is up to <strong>130x</strong> faster than <code>malloc</code>.</p>
<h3 id="many-more">Many more!</h3>
<p>Those were not all allocators that exist. There are many more.
Each algorithm has advantages and disadvantages, and it&rsquo;s up to us to choose the best one for the job.</p>
<p>Some I didn&rsquo;t mention:</p>
<ul>
<li>






  


<a href="https://en.wikipedia.org/wiki/Buddy_memory_allocation"
   >Buddy allocator</a></li>
<li>






  


<a href="https://www.geeksforgeeks.org/operating-system-allocating-kernel-memory-buddy-system-slab-system/"
   >Slab allocator</a></li>
</ul>
<h2 id="native-allocation-replacements">Native allocation replacements</h2>
<p>Some libraries just provide an extra layer between us and <code>malloc</code> but not necessarily using the concepts we described before.
They still lack context about our use-case and need to solve every problem just like <code>malloc</code>. However, they manage to be considerably faster than the default solution.</p>
<p>Depending on what you do, these libraries might be enough. However, setup is not always as intuitive and straight-forward as it should be.</p>
<p>One example is 






  
    
  


<a href="https://github.com/microsoft/mimalloc"
    target="_blank"
  >microsoft/mimalloc</a>.</p>
<h2 id="resources">Resources</h2>
<ul>
<li>






  


<a href="https://gamasutra.com/blogs/MichaelKissner/20151104/258271/Writing_a_Game_Engine_from_Scratch__Part_2_Memory.php"
   >Writing a Game Engine from Scratch - Part 2: Memory</a></li>
<li>






  


<a href="https://youtu.be/rX0ItVEVjHc?t=1830"
   >CppCon 2014: Mike Acton &ldquo;Data-Oriented Design and C++&rdquo;</a></li>
<li>






  


<a href="https://www.gamasutra.com/blogs/ThomasYoung/20141002/226898/Custom_Vector_Allocation.php"
   >Custom Vector Allocation</a></li>
<li>Some allocator implementation examples: 






  


<a href="https://github.com/mtrebi/memory-allocators"
   >mtrebi/memory-allocators</a></li>
</ul>
]]></content></item></channel></rss>