K C

K C

  • NA
  • 3
  • 0

Regex for scraping content from html

Oct 16 2009 8:50 AM
Hi all,

I need a regex to match the the text between <a> tags that are also nested between <h3> tags - for example:

<h3><a href="">this text</a></h3>

I started using this: (?<=<h3>)[\s\S]*?(?=</h3>) which does match the text between the header tag, I also started playing around with <a.*?>[\s\S]*?</a> to retrieve links which also works, but I am having problems when trying to combine the 2 together in one pattern.

I might be going about this the wrong way altogether! I am new to regex and am still trying to learn, any help with this will be greatly appreciated!!

Krishna

Answers (4)