C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
First: This console application first gets the first TITLE element from the HTML file.
Then: The program prints the title to the console. The application must have the specified HTML file present in the current directory.
Pattern: This looks for a start tag and an end tag. It ignores whitespace between the inner parts of the tags and the string.
C# program that gets TITLE element from HTML
using System;
using System.IO;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// Read in an HTML file.
string html = File.ReadAllText("Problem.html");
// Get the title of the HTML.
Console.WriteLine(GetTitle(html));
// End.
Console.ReadLine();
}
/// <summary>
/// Get title from an HTML string.
/// </summary>
static string GetTitle(string file)
{
Match m = Regex.Match(file, @"<title>\s*(.+?)\s*</title>");
if (m.Success)
{
return m.Groups[1].Value;
}
else
{
return "";
}
}
}
Output
Title of the Page
Pattern:
@ Uses special string syntax.
\s* Matches 0 or more spaces.
(.+?) Matches text but isn't greedy.
Stops as soon as it can.
\s* Matches 0 or more spaces.
Match C# regular expression object.
Groups[1] First group found in input.
Starts at 1.
Value String value of Group.
Also: They assume the HTML is lowercase, although this could be easily changed.
Paragraphs: You can use regular expressions like these for reading important elements from your HTML.