Data Source Tutorial Appendix B: Data Parser
Polycom, Inc. 187
First we need to extract the value of the title attribute using another GetTag
rule (which, again, is inserted as a child of the first rule) that extracts the text
between "title="" (including the opening quote character) and the closing
quote character:
<ParsingRule type="GetTag" source="DaySource"
result="Condition">
<StartTag>title="</StartTag>
<EndTag>"</EndTag>
</ParsingRule>
Then we remove the "Chance for" string, and any text that follows
it, using the TrimFromStart rule:
<ParsingRule type="TrimFromStart" source="Condition"
result="Condition">
<SearchText>Chance for</SearchText>
</ParsingRule>
This leaves us with the desired text, but with a trailing space character (e.g.
"Heavy Rain "). We remove this extra space with a Trim rule:
<ParsingRule type="Trim" source="Condition" result="Condition"
/>
Why didn't we just use a <space> token in the SearchText parameter of the
TrimFromStart rule (e.g. <SearchText><space>Chance
for</SearchText>)? Because TrimFromStart's SearchText parameter does not
support the special "tag" syntax, so it doesn't interpret the <space> token as a
space character.
Now let's turn to the temperature. In each table cell, the temperature string is
always found between a <br> tag and a closing </font> tag (e.g. <br>Hi <font
color="#FF0000">81°F</font>). So we use those tags in a GetTag rule to
extract the temperature:
<ParsingRule type="GetTag" source="DaySource" result="Temp">
<StartTag><br></StartTag>
<EndTag></font></EndTag>
</ParsingRule>
This gets us the temperature, but with the string "<font color="#FF0000">"
embedded inside. We want to keep the "Hi" or "Lo" part, so we just want to
remove that opening <font …> element. If you look at the HTML code, you'll
see that "Hi" temperatures get a font color of #FF0000 while "Lo" temperatures
get #0033CC. Since we want to remove the font tag regardless of what color is
specified, we use a ReplaceTag rule to remove everything from the "<font" to
the ">":
<ParsingRule type="ReplaceTag" source="Temp" result="Temp">
<StartTag><font</StartTag>
<EndTag>></EndTag>
<NewText/>
</ParsingRule>