## LaTeX forum ⇒ Graphics, Figures & Tables ⇒ pgfplots : How to add a linear regression line? Topic is solved

Information and discussion about graphics, figures & tables in LaTeX documents.
Cham
Posts: 826
Joined: Sat Apr 02, 2011 4:06 pm

### pgfplots : How to add a linear regression line?

I'm studying the pgfplots package to make some cool graphics using experimental data. I usually use Excel for this kind of stuff, but I'm interested in doing the same directly with LaTeX.

The trouble with most LaTeX documentations is that they're extremely heavy, and their examples are too often "out of this world". I'm having difficulties in finding the proper setup to add a linear regression line, and to customize the markers, and the tick numbers. Here's a MWE to work with :
\documentclass[11pt,letterpaper,twoside]{article}\usepackage[total={6.5in,10in},left=1in,top=0.5in,includehead,includefoot]{geometry}\usepackage{pgfplots}\pgfplotsset{compat=newest} \begin{document} Some cute data graphic : \begin{center}	\begin{tikzpicture}	\begin{axis}[		height=9cm,		width=9cm,		grid=both,		tick align=inside,		minor tick num=3,		major tick style={black,thin},		minor tick style={black},		major grid style={color=gray!60,densely dashed},		minor grid style={color=gray!50,densely dotted},		tick label style={font=\footnotesize},		label style={font=\normalsize},		xlabel=$X$,		ylabel=$Y$,		title={Some title},		xtick={-1, 0, 1, 2, 3, 4, 5},		ytick={-1, 0, 1, 2, 3, 4, 5, 6}	]	\addplot[blue,mark=*,mark size=1.10,only marks,error bars/.cd,x dir=both,y dir=both,x explicit,y explicit]	coordinates {		(0, 0) +- (0.1, 0.4)		(0.5, 1) +- (0.4, 0.2)		(1, 2) +- (0.2, 0.4)		(1.55, 3.25) +- (0.15, 0.3)		(2, 5) +- (0.3, 0.3)};	\end{axis}	\end{tikzpicture}\end{center} \end{document}

Here's a preview, with a few things shown in red that I want to add in the plot :
graphic.jpg (30.4 KiB) Viewed 318 times

1. How to modify the code above to add a linear regression line, with options for its style (color and thickness)?

2. How to show its equation (with many digits)? and to change its "X" and "Y" symbols? (so the equation is using exactly the same symbols as on the axis).

3. How to add more digits to the tick label numbers, on the two axis? Writing 0.0, 1.0, ... in the code do nothing.

4. How to change the marker's color, and the error bars color separately? For example the markers in blue, and all the error bars in red?

Cham
Posts: 826
Joined: Sat Apr 02, 2011 4:06 pm
The answer to question 4 above is simply to add the following option :
error bar style={red,thin} % ultra thin, very thin, thick, ...

EDIT : The answer to question 3 appears to be this, added to the axis options :
xticklabel style={/pgf/number format/precision=1,/pgf/number format/fixed zerofill},

That one was not obvious!

mas
Posts: 204
Joined: Thu Dec 04, 2008 4:39 am
1. How to modify the code above to add a linear regression line, with options for its style (color and thickness)?

\addplot+[no marks,red, thick] {1.2304 * x + 0.5121 } ;

2. How to show its equation (with many digits)? and to change its "X" and "Y" symbols? (so the equation is using exactly the same symbols as on the axis).

The usual tikz node command should do the trick.
\node at (1.5,0) {Y=1.2304 X + 0.5121} ; OS: Debian/GNU Linux; LaTeX System : TeXLive; Editor : Vim Cham Posts: 826 Joined: Sat Apr 02, 2011 4:06 pm This is working great. Thanks a lot for the trick. I've found another way, more complicated, but which has the advantage to calculate the proper trend line from the data points. The legend could also give the proper linear equation with any number of digits. I may give the code later, but it's much more complicated than your solution. mas Posts: 204 Joined: Thu Dec 04, 2008 4:39 am Good to hear that your problem is solved. pgfplots can do regression and draw the appropriate trend line. Since that was not the question, I did not suggest it Good idea to post your solution when you find time. OS: Debian/GNU Linux; LaTeX System : TeXLive; Editor : Vim Cham Posts: 826 Joined: Sat Apr 02, 2011 4:06 pm I'm almost getting it right! Here's a complete code (a bit complex. How can I simplify it ?) : \documentclass[11pt,letterpaper,twoside]{article}\usepackage[total={6.5in,10in},left=1in,top=0.5in,includehead,includefoot]{geometry}\usepackage{pgfplots,pgfplotstable}\pgfplotsset{compat=newest} \begin{document} \begin{center} \begin{tikzpicture} \begin{axis}[ height=12cm, width=15cm, grid=both, tick align=inside, minor tick num=4, major tick style={black,thin}, minor tick style={black}, major grid style={color=gray!60,densely dashed}, minor grid style={color=gray!50,densely dotted}, tick label style={font=\footnotesize}, label style={font=\normalsize}, xlabel=X$, ylabel=$Y$, xmin=-0.5, xmax=2.5, ymin=-1, ymax=6, title={Put an hilarious title here}, xtick={-1, -0.5, 0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5}, ytick={-1, 0, 1, 2, 3, 4, 5, 6}, xticklabel style={/pgf/number format/precision=2,/pgf/number format/fixed zerofill}, yticklabel style={/pgf/number format/precision=0,/pgf/number format/fixed zerofill}, legend pos=south east, legend style={empty legend} ] \addplot[ blue, mark=*, mark size=1.20, only marks, error bars/.cd, x dir=both, y dir=both, x explicit, y explicit, error bar style={black,semithick} ] coordinates{ (0, 0) +- (0.1, 0.3) (0.5, 1) +- (0.2, 0.15) (1.1, 2.3) +- (0.2, 0.4) (1.55, 3.25) +- (0.06, 0.5) (2, 4.7) +- (0.3, 0.3)}; % Average slope line : \addplot[black,thick,empty legend] table[y={create col/linear regression={y=Y}}] { X Y 0 0 0.5 1 1.1 2.3 1.55 3.25 2 4.7 }; \addlegendentry[/pgf/number format/precision=6]{$\ell_{ave} = \pgfmathprintnumber[]{\pgfplotstableregressiona} \, m		\pgfmathprintnumber[print sign]{\pgfplotstableregressionb}$}; % Maximal slope line : \addplot[ black, thick, mark=*, mark size=1.3, mark options={fill=white}, only marks ] coordinates{ (0.1, -0.41763) (1.8, 5.009598) }; \addplot[thick,red,empty legend] table[y={create col/linear regression={y=Y}}] { X Y 0.1 -0.41763 1.8 5.009598 }; \addlegendentry[/pgf/number format/precision=6]{$\ell_{max} = \pgfmathprintnumber[]{\pgfplotstableregressiona} \, m		\pgfmathprintnumber[print sign]{\pgfplotstableregressionb}$}; % Minimal slope line : \addplot[ black, thick, mark=*, mark size=1.3, mark options={fill=white}, only marks ] coordinates{ (-0.1, 0.18237) (2.4, 4.409598) }; \addplot[thick,olive,empty legend] table[y={create col/linear regression={y=Y}}] { X Y -0.1 0.18237 2.4 4.409598 }; \addlegendentry[/pgf/number format/precision=6]{$\ell_{min} = \pgfmathprintnumber[]{\pgfplotstableregressiona} \, m		\pgfmathprintnumber[print sign]{\pgfplotstableregressionb}\$	};	\end{axis}	\end{tikzpicture}\end{center} \end{document}

Here's an hilarious preview ( ), with an unsolved problem indicated in red :
graph.jpg (53.41 KiB) Viewed 275 times

The problem are the numbers in the three linear trend equations in the legend. How can I tell LaTeX to show the proper numbers calculated for each trend line? Currently, it's only repeating the proper numbers of the third line only (the minimal slope line).

Cham
Posts: 826
Joined: Sat Apr 02, 2011 4:06 pm
pgfplot Manual says on page 394 that the numbers pgfplotstableregressiona and pgfplotstableregressionb are stored globally. This explains why my equations are showing the same numbers for all three lines. How can I show the numbers properly? The manual isn't clear about this.

Cham
Posts: 826
Joined: Sat Apr 02, 2011 4:06 pm
Ahaa! I think I get it. From page 395 of the manual, I need to use the optional commands
\xdef\slopeA{\pgfplotstableregressiona}\xdef\bA{\pgfplotstableregressionb}

then use the new commands \slopeA and \bA instead of \pgfplotstableregressiona and \pgfplotstableregressionb in my equation (in the legend). It appears to work nicely.

Stefan Kottwitz
Posts: 8596
Joined: Mon Mar 10, 2008 9:44 pm
Location: Hamburg, Germany
Contact:
Interesting!

Just a short remark: I noticed that you wrote the indices ave, max and min in italic. It's common practice to write variables in italic, but operators (and such names) in upright shape. That's why there are commands \max and \min. Also units are commonly written upright, so they are not mistaken for variables (m for meter instead of the variable m).

Stefan