asp.net 新闻抓取,并且保存到自己数据库应该怎么实现
貌似是用webRequest类,有高手的麻烦给下源代码,
或者发我邮箱wangyuyu247494692@qq.com.
必定重谢!! --------------------编程问答-------------------- http://www.cftea.com/c/2007/08/H89S2ILKP2SPAKT7.asp --------------------编程问答--------------------
<%@ Page Language="C#" %>--------------------编程问答-------------------- 没有截取? --------------------编程问答-------------------- 按要求截取内容?
<%@ Import Namespace="System.Net" %>
<%@ Import Namespace="System.IO" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<script runat="server">
void Page_Load(object sender, EventArgs e)
{
try
{
WebRequest request = WebRequest.Create("http://www.baidu.com/");
WebResponse response = request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("gb2312"));
tb.Text = reader.ReadToEnd();
reader.Close();
reader.Dispose();
response.Close();
}
catch (Exception ex)
{
tb.Text = ex.Message;
}
}
</script>
<html xmlns="http://www.w3.org/1999/xhtml" >
<head runat="server">
<title>抓取网页内容 </title>
</head>
<body>
<form id="form1" runat="server">
<div>
<asp:TextBox ID="tb" runat="server" Width="500" Height="300" TextMode="multiLine"></asp:TextBox>
</div>
</form>
</body>
</html>
你把网页的内容都获取了
截取就是操作字符串的问题了吧 --------------------编程问答-------------------- 截取的话要根据你的需求,看你需要什么内容,然后在根据你抓取的页面查找规律,通过正则获取! --------------------编程问答-------------------- 还在用前后台同体啊
--------------------编程问答-------------------- webclient
httpwebrequest抓取数据,正则获取特定数据
System.Net.HttpWebRequest request = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(url);
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)";
System.Net.WebResponse response = request.GetResponse();
System.IO.Stream resStream = response.GetResponseStream();
System.IO.StreamReader sr = new System.IO.StreamReader(resStream, encoding);
string html = (sr.ReadToEnd());
resStream.Close();
sr.Close();
System.Net.WebClient wc = new System.Net.WebClient();
wc.Credentials = System.Net.CredentialCache.DefaultCredentials;
Byte[] pageData = wc.DownloadData(PageUrl);
string Content= System.Text.Encoding.Default.GetString(pageData);
Regex reg = new Regex(@"(?i)(?<=<span.*?id=""s"".*?>)[^<]+(?=</span>)");
--------------------编程问答-------------------- --------------------编程问答-------------------- 不同页面不同分析
正则式或者字符串搜索,找到新闻页的标题、内容字符串。入库
补充:.NET技术 , ASP.NET