需求三:页面单跳转化率
什么是页面单跳转换率,比如一个用户在一次 Session 过程中访问的页面路径 3,5,7,9,10,21,那么页面 3 跳到页面 5 叫一次单跳,7-9 也叫一次单跳,那么单跳转化率就是要统计页面点击的概率,比如:计算 3-5 的单跳转化率,先获取符合条件的 Session 对于页面 3 的访问次数(PV)为 A,然后获取符合条件的 Session 中访问了页面 3 又紧接着访问了页面 5 的次数为 B,那么 B/A 就是 3-5 的页面单跳转化率。
1 MySQL建表
CREATE TABLE `jump_page_ratio` (
`task_id` text,
`page_jump` VARCHAR(20),
`ratio` DECIMAL(10,3)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
2 思路分析
(1)获取配置文件中的目标跳转顺序;
(1,2,3,4,5,6,7)
(2)按照目标跳转顺序计算所需单页id及跳转id;
单页id:1,2,3,4,5,6,
跳转id:1-2,2-3,3-4,4-5,5-6,6-7
(3)统计出单页访问次数并根据目标页面进行过滤,分母;
(1,100),(2,125),(3,150)… …
(4)统计出同一个Session内跳转的页面并根据目标跳转顺序进行过滤,分子;
(1-2,50),(2-3,80),(3-4,59)… …
(5)计算跳转率;
(1-2)的count除以1的count
(6)写入MySQL。
3 代码实现
SingleJumpApp:
package com.atguigu.app
import java.util.{Properties, UUID}
import com.alibaba.fastjson.{JSON, JSONObject}
import com.atguigu.datamode.UserVisitAction
import com.atguigu.utils.{JdbcUtil, PropertiesUtil}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SparkSession
object SingleJumpApp {
def main(args: Array[String]): Unit = {
//1.创建SparkSession对象
val spark: SparkSession = SparkSession.builder()
.appName(“SingleJumpApp”)
.enableHiveSupport()
.master(“local[*]”)
.getOrCreate()
import spark.implicits._
//2.读取配置文件
val properties: Properties = PropertiesUtil.load(“conditions.properties”)
val conditionStr: String = properties.getProperty(“condition.params.json”)
val conditionObj: JSONObject = JSON.parseObject(conditionStr)
//3.取出其中的目标页面跳转
val targetPageFlow: String = conditionObj.getString(“targetPageFlow”)
val targetPageFlowArray: Array[String] = targetPageFlow.split(“,”)
//4.获取所需单页点击次数目标页
//1,2,3,4,5,6
val singlePageArray: Array[String] = targetPageFlowArray.dropRight(1)
//2,3,4,5,6,7
val pageArray: Array[String] = targetPageFlowArray.drop(1)
//5.获取跳转目标
val targetJumpPage: Array[String] = singlePageArray.zip(pageArray).map {
case (x, y) =>
s”$x-$y”
}
//6.读取Hive数据
val userVisitActionRDD: RDD[UserVisitAction] = spark.sql(“select * from user_visit_action”).as[UserVisitAction].rdd
userVisitActionRDD.cache()
//7.过滤出所需的单页数据
val filterPageRDD: RDD[UserVisitAction] = userVisitActionRDD.filter(userVisitAction =>
singlePageArray.contains(userVisitAction.page_id.toString)
)
//8.计算每个页面被点击次数
val singlePageCount: collection.Map[String, Long] = filterPageRDD.map(userVisitAction => (userVisitAction.page_id.toString, 1)).countByKey
//9.按照Session进行分组
val sessionToUserActionRDD: RDD[(String, Iterable[(String, Long)])] = userVisitActionRDD.map(userVisitAction => (userVisitAction.session_id, (userVisitAction.action_time, userVisitAction.page_id))).groupByKey()
//10.排序
val jumpPageAndOne: RDD[(String, Long)] = sessionToUserActionRDD.flatMap { case (session, items) =>
val pageIds: List[String] = items.toList.sortBy(_._1).map(_._2.toString)
val fromPageIds: List[String] = pageIds.dropRight(1)
val toPageIds: List[String] = pageIds.drop(1)
//计算单跳
val jumpPage: List[String] = fromPageIds.zip(toPageIds).map { case (fromPage, toPage) =>
s”$fromPage-$toPage”
}
//过滤出所需的单跳
val filterJumpPageList: List[String] = jumpPage.filter(targetJumpPage.contains)
filterJumpPageList.foreach(println)
//返回
filterJumpPageList.map((_, 1L))
}
//11.计算总数
val jumpPageCount: collection.Map[String, Long] = jumpPageAndOne.countByKey()
//12.获取TaskID
val taskID: String = UUID.randomUUID().toString
//13.计算单跳率
val jumpPageRatio: Iterable[Array[Any]] = jumpPageCount.map { case (jumpPage, count) =>
Array(taskID, jumpPage, count.toDouble / singlePageCount.getOrElse(jumpPage.split(“-“)(0), 1L))
}
//14.写入MySQL
JdbcUtil.executeBatchUpdate(“insert into jump_page_ratio values(?,?,?)”, jumpPageRatio)
spark.close()
}
}
想要了解跟多关于大数据培训课程内容欢迎关注尚硅谷大数据培训,尚硅谷除了这些技术文章外还有免费的高质量大数据培训课程视频供广大学员下载学习。
上一篇: 大数据项目之电商热门品类中活跃 Session
下一篇: 大数据项目之电商各区域点击量Top3 商品