+Since there wasn't a big gain when parallelising the search task we could either try to develop a new parallel strategy for the search task or parallelise the matrix initialisation instead.
+
+\section*{Improving performance}
+We start by updating the function parallel\_search\_cols() to have a local variable to store the highest value found by the thread and perform only one comparison to the atomic variable. This change alone makes the program finishes with a mean value of 8.192 seconds (using 8 threads for a 40k rows and columns matrix), comparing with the values on tables 1 and 2 we can validate the benefit of this change by computing $efficiency = \frac{1.1335}{8} = 14.16 \%$.
+
+The second task, matrix initialisation, simulates the sequential section of a program in this study, but we could try to parallelise this operation has an extra exercise. The function init\_parallel\_matrix() was defined in order to populate segments of the matrix by different threads with a similar algorithm used for columns search. With all configurations out of the way, the program now takes 4,821 seconds of wall-clock time with 4 threads. The following metrics were obtained:
+
+
+\begin{figure}[h]
+\begin{minipage}[c]{0.45\textwidth}
+ \begin{itemize}
+ \item speedup = 1,9262
+ \item efficiency = 48,15\%
+ \end{itemize}
+\end{minipage}
+\begin{minipage}[c]{0.5\textwidth}
+\includegraphics[width=0.925\textwidth]{imagens/matrixsearchsequencial-07.png}
+\end{minipage}
+\end{figure}
+
+
+\section*{Conclusions}
+This study revealed a few characteristics of modern cpu behaviours. Apart from being able to spawn thousands of virtual threads to perform calculations, way more than the expected, the framework they use can also predict and make computations easy. On the last experiment of parallelising both tasks, the program takes very similar total cpu time executing them, less than 9 seconds, or less than 2,25 seconds each thread, if we compare with values presented previously for the 2nd and 3rd task before parallelisation, considerably away from each other, we can almost argue that the cpu still "remembers" the matrix while searching it.