.net - What is the fastest, case insensitive, way to see if a string contains another string in C#? -
edit 2:
confirmed performance problems due static function call stringextensions class. once removed, indexof method indeed fastest way of accomplishing this.
what fastest, case insensitive, way see if string contains string in c#? see accepted solution post here @ case insensitive 'contains(string)' have done preliminary benchmarking , seems using method results in orders of magnitude slower calls on larger strings (> 100 characters) whenever test string cannot found.
here methods know of:
indexof:
public static bool contains(this string source, string tocheck, stringcomparison comp) { if (string.isnullorempty(tocheck) || string.isnullorempty(source)) return false; return source.indexof(tocheck, comp) >= 0; }
toupper:
source.toupper().contains(tocheck.toupper());
regex:
bool contains = regex.match("string search", "string", regexoptions.ignorecase).success;
so question is, fastest way on average , why so?
edit:
here simple test app used highlight performance difference. using this, see 16 ms tolower(), 18 ms toupper , 140 ms stringextensions.contains():
using system; using system.collections.generic; using system.linq; using system.text; using system.globalization; namespace scratchconsole { class program { static void main(string[] args) { string input = ""; while (input != "exit") { runtest(); input = console.readline(); } } static void runtest() { list<string> s = new list<string>(); string containsstring = "1"; bool found; datetime now; (int = 0; < 50000; i++) { s.add("aaaaaaaaaaaaaaaa aaaaaaaaaaaa"); } = datetime.now; foreach (string st in s) { found = st.tolower().contains(containsstring); } console.writeline("tolower(): " + (datetime.now - now).totalmilliseconds); = datetime.now; foreach (string st in s) { found = st.toupper().contains(containsstring); } console.writeline("toupper(): " + (datetime.now - now).totalmilliseconds); = datetime.now; foreach (string st in s) { found = stringextensions.contains(st, containsstring, stringcomparison.ordinalignorecase); } console.writeline("stringextensions.contains(): " + (datetime.now - now).totalmilliseconds); } } public static class stringextensions { public static bool contains(this string source, string tocheck, stringcomparison comp) { return source.indexof(tocheck, comp) >= 0; } }
}
since toupper result in new string being created, stringcomparison.ordinalignorecase faster, also, regex has lot of overhead simple compare this. said, string.indexof(string, stringcomparison.ordinalignorecase) should fastest, since not involve creating new strings.
i guess (there go again) regex has better worst case because of how evaluates string, indexof linear search, i'm guessing (and again) regex using little better. regex should have best case close, though not good, indexof (due additional complexity in it's language).
15,000 length string, 10,000 loop 00:00:00.0156251 indexof-ordinalignorecase 00:00:00.1093757 regex-ignorecase 00:00:00.9531311 indexof-toupper 00:00:00.9531311 indexof-tolower placement in string makes huge difference: @ start: 00:00:00.6250040 match 00:00:00.0156251 indexof 00:00:00.9687562 toupper 00:00:01.0000064 tolower @ end: 00:00:00.5781287 match 00:00:01.0468817 indexof 00:00:01.4062590 toupper 00:00:01.4218841 tolower not found: 00:00:00.5625036 match 00:00:01.0000064 indexof 00:00:01.3750088 toupper 00:00:01.3906339 tolower
Comments
Post a Comment