+\begin{table}[htp]
+\begin{center}
+\begin{tabular}{ccc}
+\# Threads & Columns & Blocks \\
+\hline
+4 & 16.627 & 16.626\\
+8 & 12.812 & 17.125\\
+16 & 10.572 & 10.372\\
+32 & 9.265 & 9.466 \\
+49 & 8.664 & 8.422 \\
+64 & 8.28 & 8.520 \\
+400 & 7.749 & 10.884 \\
+800 & 7.916 & 10.613 \\
+1000 & 8.126 & 10.560
+\end{tabular}
+\end{center}
+\caption{Program user time for each strategy per threads, for matrix with 40k rows and columns.}
+\label{wall-cpu time of parallel strategies}
+\end{table}%
+
+Combining values from tables 1 and 2, and using the formulas presented before we can compute the performance metrics for both strategies:
+
+\begin{table}[htp]
+\begin{center}
+\begin{tabular}{cccc}
+Strategy & \#Threads & Speedup & Efficiency (\%)\\
+\hline
+Columns & 400 & 1.1467 & 0.007\\
+Blocks & 400 & 0.8164 & 0.2 \\
+Columns & 49 & 1.026 & 2.1 \\
+Blocks & 49 & 1.0526 & 2.1
+\end{tabular}
+\end{center}
+\caption{Performance of parallel strategies.}
+\label{performance of parallel strategies}
+\end{table}%
+
+We can conclude that the parallelisation of our initial program didn't got a significant performance gain. Probably due to the nature of our problem, since the biggest value is always in the last entry of the matrix, threads will have to loop through the entire batch of items to compare values. But also because of the implementation applied, as mentioned before, the function run\_parallel\_search (int num\_threads) could be improved in order to access the atomic variable only once per thread.
+
+The metric scalability can be divided in two: \underline{strong scaling}, when the objective is to evaluate how faster a program gets as resources (or threads) are added to a fixed problem size - we already computed this analysis in table 2. There, we can see that the strategy to parallelise the search matrix task by columns has a better performance with the increase of resources. The blocks strategy algorithm loses performance sooner, and if we look at figure 1, we can notice that it needs much more resources to achieve similar total cpu time as the columns split. The \underline{weak scalability} is measured to evaluate if a program can handle larger problems in the same amount of time by adding more resources, ideally, the execution time should remain constant as both work and threads are scaled. From the next plot we can conclude that the columns split is a better strategy for the peculiar situation in study.
+
+\begin{figure}[h]
+ \centering
+ \includegraphics[width=0.6\textwidth]{imagens/matrixsearchsequencial-06.png}
+ \caption{Weak scalability evaluation from 1 to 8 threads.}
+\end{figure}
+
+Since there wasn't a big gain when parallelising the search task we could either try to develop a new parallel strategy for the search task or parallelise the matrix initialisation instead.
+
+\section*{Improving performance}
+tbd