Page 1 of 1
get plain text from mail body
Posted:
Tue Jul 17, 2012 10:30 am
by Hans Nieuwenhuis
Hi,
Hi,
I am using the Exchange Plugin from It2Be ( b.t.w. nice plugin !!) to interact with exchange 2010.
When I get the body from a mail it can have markup ( html / rtf / word /??)
is there a way to get the plain text from it ?
Regards,
Re: get plain text from mail body
Posted:
Tue Jul 17, 2012 10:49 am
by mboegem
Hi Hans,
Didn't have time to play around with the current plugin, but I used the previous Exchange plugin
This one had 2 properties in the mail object:
- plainMsg
- htmlMsg
Re: get plain text from mail body
Posted:
Tue Jul 17, 2012 10:55 am
by Hans Nieuwenhuis
The new ewsj ( exch 2010 ) does not have these...
Regards,
Re: get plain text from mail body
Posted:
Tue Jul 17, 2012 11:09 am
by Hans Nieuwenhuis
Marc,
I saw this entry from you :
- Code: Select all
var htmlEditorKit = new Packages.javax.swing.text.html.HTMLEditorKit();
var htmlDocument = htmlEditorKit.createDefaultDocument();
var reader = new java.io.StringReader('myHtmlTextString');
htmlEditorKit.read(reader, htmlDocument, 0);
var result = htmlDocument.getText(0, htmlDocument.getLength());
return utils.stringTrim(result));
If I use this on the mailbody I get an Error : Exception Object: javax.swing.text.ChangedCharSetException
Any ideas ?
Regards,
Re: get plain text from mail body
Posted:
Tue Jul 17, 2012 11:19 am
by mboegem
Hi Hans,
not sure if that will solve every situation.
html gets more and more advanced and Java doesn't keep up with all the possibilities.
Anyway, I looked into the plugin and I have seen you can get/set the bodytype.
So: does this work?
- Code: Select all
myMailObject.bodyType = plugins.it2be_exchangews.type.js_getBody_TEXT();
var _plainText = myMailObject.body;
Re: get plain text from mail body
Posted:
Tue Jul 17, 2012 11:23 am
by Hans Nieuwenhuis
Yes, but this is just a boolean or integer with value 0 or 1
plain = bodyType value = 0
or Html bodyType value is 1
Regards,
Hans
Re: get plain text from mail body
Posted:
Tue Jul 17, 2012 11:54 am
by Hans Nieuwenhuis
Well,
I also discussed this with It2Be, but I'll have to find some regex to strip out all the markup.
I used some regex to strip html
- Code: Select all
replace(/<[a-zA-Z\/][^>]*>/g,'').replace(/&[^;]+?;/g,'')
but the font stuff stays :
- Code: Select all
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 12 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
Results in :
- Code: Select all
l xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:off
ice:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://sche
mas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
undefined
undefined
undefined
undefinedundefinedv\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
undefinedundefinedundefinedundefinedundefinedundefinedundefined
undefined
undefinedundefinedundefinedundefined
undefined
undefined
Re: get plain text from mail body
Posted:
Tue Jul 17, 2012 3:11 pm
by Hans Nieuwenhuis
Got it working by using some code from a post by Marc Boegem.
Just added the line regarding characterset and then it worked.
- Code: Select all
var htmlEditorKit = new Packages.javax.swing.text.html.HTMLEditorKit();
var htmlDocument = htmlEditorKit.createDefaultDocument();
htmlDocument.putProperty("IgnoreCharsetDirective", true);
var reader = new java.io.StringReader(_mail.body);
htmlEditorKit.read(reader, htmlDocument, 0);
var result = htmlDocument.getText(0, htmlDocument.getLength());
_noHtml = utils.stringTrim(result);
Regards and thanks Marc !!