在上一篇,我们安装了redis python,现在它终于有了用武之地。
这一步是必需的,以便于我们存储数据和读取数据。。
$redis-server
我们可以用redis-cli
进入command
命令模式
$redis-server
在python下我们可以这样子
>>> import redis
>>> r = redis.StrictRedis(host='localhost', port=6379, db=0)
>>> r.set('foo', 'bar')
True
>>> r.get('foo')
'bar'
当然这只是一个简单的示例,实际上对于我们的数据库在上面是这样子的,为了查询某个用户的情况。。
python
Python 2.7.6 (default, Apr 12 2014, 22:23:28)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import redis
>>> pool = redis.ConnectionPool(host='localhost', port=6379, db=1)
>>> r = redis.Redis(connection_pool=pool)
>>>pipe=r.pipeline()
因为我们在设备DB的时候用的是db=1,所以与上面的不同,接着我们定义一个简单的函数来减少工作量,源自于osrc
>>> def format_key(key):
... return "{0}:{1}".format("od", key)
...
>>>
接着我们可以查询我们的情况
>>> pipe.zcard(format_key("user:{0}:lang".format("gmszone")))
<redis.client.Pipeline object at 0x10c7f6810>
>>> pipe.execute()
[0]
>>> pipe.zcard(format_key("user:{0}:lang".format("dfm")))
<redis.client.Pipeline object at 0x10c7f6810>
>>> pipe.execute()
[1]
>>>
好吧我在3月1号和2号没有提交代码,试试db=0
import redis
r = redis.StrictRedis(host='localhost', port=6379, db=1)
def _format(key):
return "{0}:{1}".format("od", key)
pipe = r.pipeline()
pipe.incr(_format("total"), 1)
pipe.execute()
这样我们就可以简单地存储数据了。
不过这是一个痛苦的过程,因为数据相当的多。。。
Mac OS 下可以用rdm查找数据,但是因为这里的数据量比较多,可能是不行的。
于是让我们痛苦的像osrc一样的存储数据吧。
def build_db_with_redis():
year = 2014
month = 3
pipe = r.pipeline()
for day in range(2, 4):
date_re = re.compile(r"([0-9]{4})-([0-9]{2})-([0-9]{2})-([0-9]+)\.json.gz")
fn_template = os.path.join("march",
"{year}-{month:02d}-{day:02d}-{n}.json.gz")
kwargs = {"year": year, "month": month, "day": day, "n": "*"}
filenames = glob.glob(fn_template.format(**kwargs))
for filename in filenames:
userinfo = []
year, month, day, hour = map(int, date_re.findall(filename)[0])
weekday = date(year=year, month=month, day=day).strftime("%w")
with gzip.GzipFile(filename) as f:
events = [line.decode("utf-8", errors="ignore") for line in f]
count = len(events)
for n, line in enumerate(events):
event = json.loads(line)
actor = event["actor"]
attrs = event.get("actor_attributes", {})
if actor is None or attrs.get("type") != "User":
# This was probably an anonymous event (like a gist event)
# or an organization event.
continue
key = actor.lower()
evttype = event["type"]
nevents = 1
contribution = evttype in ["IssuesEvent", "PullRequestEvent","PushEvent"]
pipe.incr(_format("total"), nevents)
pipe.hincrby(_format("day"), weekday, nevents)
pipe.hincrby(_format("hour"), hour, nevents)
pipe.zincrby(_format("user"), key, nevents)
pipe.zincrby(_format("event"), evttype, nevents)
# Event histograms.
pipe.hincrby(_format("event:{0}:day".format(evttype)), weekday,
nevents)
pipe.hincrby(_format("event:{0}:hour".format(evttype)), hour,
nevents)
# User schedule histograms.
pipe.hincrby(_format("user:{0}:day".format(key)), weekday, nevents)
pipe.hincrby(_format("user:{0}:hour".format(key)), hour, nevents)
# User event type histogram.
pipe.zincrby(_format("user:{0}:event".format(key)), evttype,
nevents)
pipe.hincrby(_format("user:{0}:event:{1}:day".format(key,
evttype)),
weekday, nevents)
pipe.hincrby(_format("user:{0}:event:{1}:hour".format(key,
evttype)),
hour, nevents)
# Parse the name and owner of the affected repository.
repo = event.get("repository", {})
owner, name, org = (repo.get("owner"), repo.get("name"),
repo.get("organization"))
if owner and name:
repo_name = "{0}/{1}".format(owner, name)
pipe.zincrby(_format("repo"), repo_name, nevents)
# Save the social graph.
pipe.zincrby(_format("social:user:{0}".format(key)),
repo_name, nevents)
pipe.zincrby(_format("social:repo:{0}".format(repo_name)),
key, nevents)
# Do we know what the language of the repository is?
language = repo.get("language")
if language:
# Which are the most popular languages?
pipe.zincrby(_format("lang"), language, nevents)
# Total number of pushes.
if evttype == "PushEvent":
pipe.zincrby(_format("pushes:lang"), language, nevents)
pipe.zincrby(_format("user:{0}:lang".format(key)),
language, nevents)
# Who are the most important users of a language?
if contribution:
pipe.zincrby(_format("lang:{0}:user".format(language)),
key, nevents)
pipe.execute()
然后我们就可以在下一章拿到自己的数据
[221.0, {'1': '50', '0': '41', '3': '13', '2': '33', '5': '28', '4': '22', '6': '34'}, [('PushEvent', 152.0), ('CreateEvent', 39.0), ('WatchEvent', 16.0), ('GollumEvent', 8.0), ('MemberEvent', 3.0), ('ForkEvent', 2.0), ('ReleaseEvent', 1.0)], 0, 0, 0, 11, [('CSS', 73.0), ('JavaScript', 60.0), ('Ruby', 12.0), ('TeX', 6.0), ('Python', 5.0), ('Java', 5.0), ('C++', 5.0), ('Assembly', 5.0), ('Emacs Lisp', 2.0), ('Arduino', 2.0), ('C', 1.0)]]
简要的罗列一下上面用到的method
incr(name, amount=1)
Increments the value of key by amount. If no key exists, the value will be initialized as amount 相当于是一个自加的变量
hincrby(name, key, amount=1)
Increment the value of key in hash name by amount
zincrby(name, value, amount=1)
Increment the score of value in sorted set name by amount
值得注意的是这才是真正的执行命令execute
$pipe.execute()
而下面这些数据。。
[
221.0,
{
'1': '50',
'0': '41',
'3': '13',
'2': '33',
'5': '28',
'4': '22',
'6': '34'
},
[
('PushEvent', 152.0),
('CreateEvent', 39.0),
('WatchEvent', 16.0),
('GollumEvent', 8.0),
('MemberEvent', 3.0),
('ForkEvent', 2.0),
('ReleaseEvent', 1.0)],
0, 0, 0, 11,
[
('CSS', 73.0),
('JavaScript', 60.0),
('Ruby', 12.0),
('TeX', 6.0),
('Python', 5.0),
('Java', 5.0),
('C++', 5.0),
('Assembly', 5.0),
('Emacs Lisp', 2.0),
('Arduino', 2.0),
('C', 1.0)
]
]
正是我们需要的github上面的数据,用于分析用户情况的数据。
围观我的Github Idea墙, 也许,你会遇到心仪的项目