在上一篇,我们安装了redis python,现在它终于有了用武之地。
这一步是必需的,以便于我们存储数据和读取数据。。
$redis-server我们可以用redis-cli进入command命令模式
$redis-server在python下我们可以这样子
>>> import redis
>>> r = redis.StrictRedis(host='localhost', port=6379, db=0)
>>> r.set('foo', 'bar')
True
>>> r.get('foo')
'bar'当然这只是一个简单的示例,实际上对于我们的数据库在上面是这样子的,为了查询某个用户的情况。。
python
Python 2.7.6 (default, Apr 12 2014, 22:23:28)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import redis
>>> pool = redis.ConnectionPool(host='localhost', port=6379, db=1)
>>> r = redis.Redis(connection_pool=pool)
>>>pipe=r.pipeline()因为我们在设备DB的时候用的是db=1,所以与上面的不同,接着我们定义一个简单的函数来减少工作量,源自于osrc
>>> def format_key(key):
...     return "{0}:{1}".format("od", key)
... 
>>>接着我们可以查询我们的情况
>>> pipe.zcard(format_key("user:{0}:lang".format("gmszone")))
<redis.client.Pipeline object at 0x10c7f6810>
>>> pipe.execute()
[0] 
>>> pipe.zcard(format_key("user:{0}:lang".format("dfm")))
<redis.client.Pipeline object at 0x10c7f6810>
>>> pipe.execute()
[1]
>>>好吧我在3月1号和2号没有提交代码,试试db=0
import redis
r = redis.StrictRedis(host='localhost', port=6379, db=1)
def _format(key):
    return "{0}:{1}".format("od", key)
pipe = r.pipeline()
pipe.incr(_format("total"), 1)
pipe.execute()
不过这是一个痛苦的过程,因为数据相当的多。。。
Mac OS 下可以用rdm查找数据,但是因为这里的数据量比较多,可能是不行的。
于是让我们痛苦的像osrc一样的存储数据吧。
def build_db_with_redis():
    year = 2014
    month = 3
    pipe = r.pipeline()
    for day in range(2, 4):
        date_re = re.compile(r"([0-9]{4})-([0-9]{2})-([0-9]{2})-([0-9]+)\.json.gz")
        fn_template = os.path.join("march",
                                   "{year}-{month:02d}-{day:02d}-{n}.json.gz")
        kwargs = {"year": year, "month": month, "day": day, "n": "*"}
        filenames = glob.glob(fn_template.format(**kwargs))
        for filename in filenames:
            userinfo = []
            year, month, day, hour = map(int, date_re.findall(filename)[0])
            weekday = date(year=year, month=month, day=day).strftime("%w")
            with gzip.GzipFile(filename) as f:
                events = [line.decode("utf-8", errors="ignore") for line in f]
                count = len(events)
                for n, line in enumerate(events):
                    event = json.loads(line)
                    actor = event["actor"]
                    attrs = event.get("actor_attributes", {})
                    if actor is None or attrs.get("type") != "User":
                        # This was probably an anonymous event (like a gist event)
                        # or an organization event.
                        continue
                    key = actor.lower()
                    evttype = event["type"]
                    nevents = 1
                    contribution = evttype in ["IssuesEvent", "PullRequestEvent","PushEvent"]
                    pipe.incr(_format("total"), nevents)
                    pipe.hincrby(_format("day"), weekday, nevents)
                    pipe.hincrby(_format("hour"), hour, nevents)
                    pipe.zincrby(_format("user"), key, nevents)
                    pipe.zincrby(_format("event"), evttype, nevents)
                    # Event histograms.
                    pipe.hincrby(_format("event:{0}:day".format(evttype)), weekday,
                                 nevents)
                    pipe.hincrby(_format("event:{0}:hour".format(evttype)), hour,
                                 nevents)
                    # User schedule histograms.
                    pipe.hincrby(_format("user:{0}:day".format(key)), weekday, nevents)
                    pipe.hincrby(_format("user:{0}:hour".format(key)), hour, nevents)
                    # User event type histogram.
                    pipe.zincrby(_format("user:{0}:event".format(key)), evttype,
                                 nevents)
                    pipe.hincrby(_format("user:{0}:event:{1}:day".format(key,
                                                                         evttype)),
                                 weekday, nevents)
                    pipe.hincrby(_format("user:{0}:event:{1}:hour".format(key,
                                                                          evttype)),
                                 hour, nevents)
                    # Parse the name and owner of the affected repository.
                    repo = event.get("repository", {})
                    owner, name, org = (repo.get("owner"), repo.get("name"),
                                        repo.get("organization"))
                    if owner and name:
                        repo_name = "{0}/{1}".format(owner, name)
                        pipe.zincrby(_format("repo"), repo_name, nevents)
                        # Save the social graph.
                        pipe.zincrby(_format("social:user:{0}".format(key)),
                                     repo_name, nevents)
                        pipe.zincrby(_format("social:repo:{0}".format(repo_name)),
                                     key, nevents)
                        # Do we know what the language of the repository is?
                        language = repo.get("language")
                        if language:
                            # Which are the most popular languages?
                            pipe.zincrby(_format("lang"), language, nevents)
                            # Total number of pushes.
                            if evttype == "PushEvent":
                                pipe.zincrby(_format("pushes:lang"), language, nevents)
                            pipe.zincrby(_format("user:{0}:lang".format(key)),
                                         language, nevents)
                            # Who are the most important users of a language?
                            if contribution:
                                pipe.zincrby(_format("lang:{0}:user".format(language)),
                                             key, nevents)
                pipe.execute()
然后我们就可以在下一章拿到自己的数据
[221.0, {'1': '50', '0': '41', '3': '13', '2': '33', '5': '28', '4': '22', '6': '34'}, [('PushEvent', 152.0), ('CreateEvent', 39.0), ('WatchEvent', 16.0), ('GollumEvent', 8.0), ('MemberEvent', 3.0), ('ForkEvent', 2.0), ('ReleaseEvent', 1.0)], 0, 0, 0, 11, [('CSS', 73.0), ('JavaScript', 60.0), ('Ruby', 12.0), ('TeX', 6.0), ('Python', 5.0), ('Java', 5.0), ('C++', 5.0), ('Assembly', 5.0), ('Emacs Lisp', 2.0), ('Arduino', 2.0), ('C', 1.0)]]简要的罗列一下上面用到的method
incr(name, amount=1)
Increments the value of key by amount. If no key exists, the value will be initialized as amount 相当于是一个自加的变量
hincrby(name, key, amount=1)
Increment the value of key in hash name by amount
zincrby(name, value, amount=1)
Increment the score of value in sorted set name by amount
值得注意的是这才是真正的执行命令execute
 $pipe.execute()而下面这些数据。。
[
221.0, 
{
'1': '50', 
'0': '41', 
'3': '13', 
'2': '33', 
'5': '28', 
'4': '22', 
'6': '34'
}, 
[
('PushEvent', 152.0), 
('CreateEvent', 39.0), 
('WatchEvent', 16.0), 
('GollumEvent', 8.0), 
('MemberEvent', 3.0), 
('ForkEvent', 2.0), 
('ReleaseEvent', 1.0)], 
0, 0, 0, 11,
[
('CSS', 73.0), 
('JavaScript', 60.0), 
('Ruby', 12.0), 
('TeX', 6.0), 
('Python', 5.0), 
('Java', 5.0), 
('C++', 5.0), 
('Assembly', 5.0), 
('Emacs Lisp', 2.0), 
('Arduino', 2.0), 
('C', 1.0)
]
]正是我们需要的github上面的数据,用于分析用户情况的数据。
围观我的Github Idea墙, 也许,你会遇到心仪的项目