C# Title From HTML

This C# example program gets the HTML title from strings. It uses Regex.

Title from HTML. HTML documents have title elements.

The data in title elements is important. It is used for search-engine optimization and RSS feeds. This simple method extracts the TITLE elements from HTML documents.

Example. We can extract the contents of the TITLE element from your HTML. This is important for SEO and making sure your HTML is correct. After the code, we see the Regex parts in detail and more factors.

C# program that gets TITLE element from HTML

using System;
using System.IO;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
	// Read in an HTML file.
	string html = File.ReadAllText("Problem");

	// Get the title of the HTML.
	Console.WriteLine(GetTitle(html));

	// End.
	Console.ReadLine();
    }

    /// <summary>
    /// Get title from an HTML string.
    /// </summary>
    static string GetTitle(string file)
    {
	Match m = Regex.Match(file, @"<title>\s*(.+?)\s*</title>");
	if (m.Success)
	{
	    return m.Groups[1].Value;
	}
	else
	{
	    return "";
	}
    }
}

Output

Title of the Page

This console application first gets the first TITLE element from the HTML file. It then prints it to the console. The application must have the specified HTML file present in the current directory.

Regular expressions. Here we look at descriptions of the regular expressions used in the above C# program. The regular expressions look for a start tag and an end tag. They also ignore whitespace between the inner parts of the tags and the string.

Symbol: Description

@           Uses special string syntax.
\s*         Matches 0 or more spaces.
(.+?)       Matches text but isn't greedy.
	    Stops as soon as it can.
\s*         Matches 0 or more spaces.

Match       C# regular expression object.
Groups[1]   First group found in input.
	    Starts at 1.
Value       String value of Group.

Errors. This code is not flexible enough for some HTML documents. For example, the program won't work for complicated HTML, such as HTML that heavily uses attributes. But the code that matches TITLE should work for all XHTML.

Also: They assume the HTML is lowercase, although this could be easily changed.

You can make more capable versions of this method. However, the code is like this because I didn't want a more powerful solution. It is easier to detect small errors if your code doesn't try to deal with them.

Summary. We can capture the contents of the TITLE and paragraph elements from HTML documents using the C# language. Every webmaster should know that the TITLE is important. This helper method makes it easier to process.

Note: You can use regular expressions like these for reading important elements from your HTML.

Paragraph HTML Regex

.Net

.NET Array Dictionary List String 2D Async DataTable Dates DateTime Enum File For Foreach Format IEnumerable If IndexOf Lambda LINQ Parse Path Process Property Regex Replace Sort Split Static StringBuilder Substring Switch Tuple

Java

Core Array ArrayList HashMap String 2D Cast Character Console Deque Duplicates File For Format HashSet If IndexOf Lambda Math ParseInt Process Random Regex Replace Sort Split StringBuilder Substring Switch Vector While

TheDeveloperBlog.com

C# Title From HTML

Title from HTML. HTML documents have title elements.

.Net

Java

Related Links