谁能给个C#的网页数据抓取源代码,最好是完整的?
--------------------编程问答-------------------- 爬虫最初级的用法了基本思路就是使用httprequest和httpresponse 下载页面源码 之后用正则表达式截取就好了
--------------------编程问答--------------------
//获取网页源码
public string getSourceCode(string Url, string CharSet)
{
try
{
//System.GC.Collect();
HttpWebRequest wReq = (HttpWebRequest)WebRequest.Create(Url);
wReq.UserAgent = "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)";
wReq.ContentType = "application/x-www-form-urlencoded";
wReq.Accept = "*/*";
wReq.KeepAlive = true;
wReq.Headers.Add("Accept-Language", "zh-cn,en-us;q=0.5");
WebResponse wResp = wReq.GetResponse();
Stream respStream = wResp.GetResponseStream();
StreamReader reader = new StreamReader(respStream, Encoding.GetEncoding(CharSet));
return reader.ReadToEnd();
}
catch
{
return "";
}
}
兄弟,麻烦看下这两个网址,看下有正确的没?
http://www.webkaka.com/blog/archives/ASPNet-WebClient-WebRequest-HtmlCode.html
http://www.cnblogs.com/lorn/archive/2007/12/09/988507.html --------------------编程问答--------------------
还有这个帮忙看下那个是对的
http://www.modo01.com/thread-183027-1-1.html --------------------编程问答-------------------- 百度一下,一大把,自己调试就是了 --------------------编程问答-------------------- http://www.2cto.com/kf/201106/93983.html 这个就满足了。
http://www.cnblogs.com/netwom/archive/2009/01/05/953430.html --------------------编程问答--------------------
--------------------编程问答--------------------
private void Page_Load(object sender, System.EventArgs e)
{
// 在此处放置用户代码以初始化页面
if (!IsPostBack)
{
string html = GetWebContent("http://www.sina.com.cn");
string sTemp = Regex.Split(html, "<title>", RegexOptions.IgnoreCase)[1];
string sHref = Regex.Split(sTemp, "</title>", RegexOptions.IgnoreCase)[0];
Response.Write(sHref);
}
}
//根据Url地址得到网页的html源码
private string GetWebContent(string Url)
{
string strResult = "";
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(Url);
request.Timeout = 30000;
request.Headers.Set("ragma", "no-cache");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream streamReceive = response.GetResponseStream();
System.Text.Encoding encoding = System.Text.Encoding.GetEncoding("GB2312");
StreamReader streamReader = new StreamReader(streamReceive, encoding);
strResult = streamReader.ReadToEnd();
}
catch
{
return "";
}
return strResult;
}
上边的这段代码获取不到内容和日期呀?高手指点下呀? --------------------编程问答--------------------
--------------------编程问答-------------------- 学习了 --------------------编程问答-------------------- http://www.jb51.net/article/16618.htm
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.sina.com.cn");
request.AllowAutoRedirect = false;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream s = response.GetResponseStream();
StreamReader sr = new StreamReader(s,Encoding.GetEncoding("gb2312"));
string strHtml = sr.ReadToEnd();
string strHead = Regex.Match(strHtml, "<title>(.*)</title>").Groups[1].Value ;
Response.Write(strHead);
补充:.NET技术 , .NET技术前瞻