Data Source Tutorial Appendix B: Data Parser
Polycom, Inc. 185
Now that we've isolated the weather data for a single day part, we can extract
that data and save it in a row of our table. First we obtain the name of the day
part using a GetTag rule to extract the text the lies between the <b> and </b>
tags in the DaySource container:
<ParsingRule type="GetTag" source="DaySource" result="DayPart">
<StartTag><b></StartTag>
<EndTag></b></EndTag>
</ParsingRule>
There's a subtle problem here. What we want is to extract the text from
between the first (indeed, the only) pair of <b> and </b> tags. However,
simply adding the above rule to the <DataItem> element will not work:
<DataItem NumRecords="9" ClearTableOnStart="True">
<ParsingRule type="GetTag" Source="Source" result="DaySource">
<StartTag><td </StartTag>
<EndTag></td></EndTag>
</ParsingRule>
<ParsingRule type="GetTag" source="DaySource" result="DayPart">
<StartTag><b></StartTag>
<EndTag></b></EndTag>
</ParsingRule>
</DataItem>
The above code will only extract the first day part's name, but will not extract
anything for subsequent day parts. To understand why, recall that the GetTag
rule is index-aware. This means that on each iteration of the DataItem's rules,
GetTag is trying to obtain that iteration's occurrence of the StartTag and
EndTag values. So on the first iteration, the second GetTag rule retrieves the
text between first occurrence of <b> and </b>. But on the second iteration that
rule tries to retrieve the text between the second occurrence of <b> and </b>.
Since there is no second occurrence of <b> and </b> in the DaySource
container the rule retrieves nothing.
To achieve what we want, the second rule has to be made a child of the first
rule:
<DataItem NumRecords="9" ClearTableOnStart="True">
<ParsingRule type="GetTag" Source="Source" result="DaySource">
<StartTag><td </StartTag>
<EndTag></td></EndTag>
<ParsingRule type="GetTag" source="DaySource" result="DayPart">
<StartTag><b></StartTag>
<EndTag></b></EndTag>
</ParsingRule>