Here is the problem:
I'm trying to match all <p> that is followed by two small letters, except if letter nr 2 is a ). The following Regular Expression does the trick:
<p>[a-z][^\);]
Now I want to exclude the two letters from the capture, so only <p> is captured. I've been trying to use a positive lookahead but it's not working:
q(?=<p>[a-z][^\);])
Anyone know how I can use a look-around assertion on this problem?
Here is a little sample text:
<p>Regulation arranged at the auto-interconnection in the transformer (the low-voltage side-line</p>
<p>terminal) requires the tapping winding and tapchanger to be designed with the insulation level</p>
<p>of the X-terminal. They will be directly exposed to steep-front voltage transients from lightning</p>
<p>or switching surges. Figure 7 shows a number of different arrangements.</p>
<p>a) The number of turns in the common winding remains unchanged. This is a logical choice if the low-voltage</p>
<p>system voltage remains relatively constant while the high-voltage system voltage is more variable.</p>
<p>b) This alternative is the opposite to a). The number of turns facing the high-voltage system voltage remains</p>
<p>constant, while the effective number of turns of the low-voltage side varies.</p>
I want to capture the <p> in line 4 and 6.
>>57034139
Never mind, I figured it out.
<p>(?=[a-z])(?=[^\);])
I feel like a retard.
>>57034139
use a fucking XML lib
>>57034234
>>57034293
Cute that you need an image of text that someone else put together to support yourself.
>it's another let's parse xml with regex episode
Thought they put that series off years ago.
>>57034234
Don't know about OP's use case, but HTML is rarely valid XML because web devs can't be bothered
>>57034478
beginning tags should usually be enough.