Page 1 of 1

get plain text from mail body

PostPosted: Tue Jul 17, 2012 10:30 am
by Hans Nieuwenhuis
Hi,

Hi,

I am using the Exchange Plugin from It2Be ( b.t.w. nice plugin !!) to interact with exchange 2010.

When I get the body from a mail it can have markup ( html / rtf / word /??)
is there a way to get the plain text from it ?

Regards,

Re: get plain text from mail body

PostPosted: Tue Jul 17, 2012 10:49 am
by mboegem
Hi Hans,

Didn't have time to play around with the current plugin, but I used the previous Exchange plugin
This one had 2 properties in the mail object:
- plainMsg
- htmlMsg

Re: get plain text from mail body

PostPosted: Tue Jul 17, 2012 10:55 am
by Hans Nieuwenhuis
The new ewsj ( exch 2010 ) does not have these...

Regards,

Re: get plain text from mail body

PostPosted: Tue Jul 17, 2012 11:09 am
by Hans Nieuwenhuis
Marc,

I saw this entry from you :

Code: Select all
var htmlEditorKit = new Packages.javax.swing.text.html.HTMLEditorKit();
   var htmlDocument = htmlEditorKit.createDefaultDocument();
   
   var reader = new java.io.StringReader('myHtmlTextString');
   htmlEditorKit.read(reader, htmlDocument, 0);
   
   var result =  htmlDocument.getText(0, htmlDocument.getLength());

   return utils.stringTrim(result));


If I use this on the mailbody I get an Error : Exception Object: javax.swing.text.ChangedCharSetException

Any ideas ?

Regards,

Re: get plain text from mail body

PostPosted: Tue Jul 17, 2012 11:19 am
by mboegem
Hi Hans,

not sure if that will solve every situation.
html gets more and more advanced and Java doesn't keep up with all the possibilities.

Anyway, I looked into the plugin and I have seen you can get/set the bodytype.

So: does this work?
Code: Select all
myMailObject.bodyType = plugins.it2be_exchangews.type.js_getBody_TEXT();
var _plainText = myMailObject.body;

Re: get plain text from mail body

PostPosted: Tue Jul 17, 2012 11:23 am
by Hans Nieuwenhuis
Yes, but this is just a boolean or integer with value 0 or 1

plain = bodyType value = 0
or Html bodyType value is 1

Regards,

Hans

Re: get plain text from mail body

PostPosted: Tue Jul 17, 2012 11:54 am
by Hans Nieuwenhuis
Well,

I also discussed this with It2Be, but I'll have to find some regex to strip out all the markup.

I used some regex to strip html

Code: Select all
replace(/<[a-zA-Z\/][^>]*>/g,'').replace(/&[^;]+?;/g,'')


but the font stuff stays :

Code: Select all
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 12 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
   {font-family:"Cambria Math";
   panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
   {font-family:Calibri;
   panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
   {font-family:Tahoma;
   panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
   {margin:0cm;
   margin-bottom:.0001pt;


Results in :

Code: Select all
l xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:off
ice:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://sche
mas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">


undefined
undefined
undefined
undefinedundefinedv\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
undefinedundefinedundefinedundefinedundefinedundefinedundefined
undefined
undefinedundefinedundefinedundefined
undefined
undefined

Re: get plain text from mail body

PostPosted: Tue Jul 17, 2012 3:11 pm
by Hans Nieuwenhuis
Got it working by using some code from a post by Marc Boegem.

Just added the line regarding characterset and then it worked.

Code: Select all
var htmlEditorKit = new Packages.javax.swing.text.html.HTMLEditorKit();
var htmlDocument = htmlEditorKit.createDefaultDocument();
         
htmlDocument.putProperty("IgnoreCharsetDirective", true);
var reader = new java.io.StringReader(_mail.body);
htmlEditorKit.read(reader, htmlDocument, 0);
var result =  htmlDocument.getText(0, htmlDocument.getLength());
_noHtml = utils.stringTrim(result);


Regards and thanks Marc !!