Twitter | Search | |
Apache Parquet
language agnostic, open source Columnar file format for analytics
404
Tweets
30
Following
6,879
Followers
Tweets
Apache Parquet retweeted
Julien Le Dem Nov 6
PSA: If you use the page-level statistics in please chime in on JIRA:
Reply Retweet Like
Apache Parquet retweeted
Raniere Silva Jul 25
Last speaker on the 's scientific room before lunch is Peter Hoffmann talking about#Pandas and to work with large datasets in .
Reply Retweet Like
Apache Parquet retweeted
Gyula Fora Jul 30
Replying to @GbrHrmnn @bol_com and 4 others
Have a look at the bucketing sink rework for the upcoming release and the Parquet writer ;)
Reply Retweet Like
Apache Parquet retweeted
SearchDataManagement Jul 19
What's behind 's growing popularity? Perhaps its orientation. Learn more about the benefits of columnar data layout.
Reply Retweet Like
Apache Parquet retweeted
Rajat Khandelwal 18 Jun 18
Can someone answere this -> why is format faster than other columnar storage like hbase, kudu etc?
Reply Retweet Like
Apache Parquet retweeted
Stefan Larsson Jun 28
I am trying to understand the advantages and disadvantages of compared to , especially once the HDF5-connector for is ready. Does anyone care to enlighten me?
Reply Retweet Like
Apache Parquet retweeted
Lee Blum Jul 2
My talk from the DMBI 2018 Conference at about our journey at to Analytics on is available at . Thanks everyone for attending!
Reply Retweet Like
Apache Parquet retweeted
Thiago de Faria 19 Apr 18
How big data is?? Well... after filtering the collisions, they generate 12.3 PB in a month... Special ROOT format +
Reply Retweet Like
Apache Parquet retweeted
Lee Blum 23 Apr 18
In one month from now I'll be speaking on big data journey with and at the Conference in London. If you're there, drop by!
Reply Retweet Like
Apache Parquet retweeted
Florian Rathgeber 🇪🇺 28 Apr 18
Reply Retweet Like
Apache Parquet retweeted
Renee Yao 27 Mar 18
Reply Retweet Like
Apache Parquet retweeted
lucien fregosi 26 Mar 18
Great benchmark between on and In short kudu is faster than Parquet for random access Querys like CRUD operations but slower for analytics queries.
Reply Retweet Like
Apache Parquet retweeted
Julien Le Dem 5 Mar 18
If you’re a company using open source projects and not sure how to contribute, a release engineer would be a tremendous help. It’s hard to do this properly part time. I have a specific project in mind, if you need a hint.
Reply Retweet Like
Apache Parquet retweeted
Mustafa Akin ⚠️ 28 Feb 18
You do not need Spark to create files, you can use plain Java and it can even fit in AWS Lambda for a serverless solution:
Reply Retweet Like
Apache Parquet retweeted
nuvolatech 5 Mar 17
Learn how to use hive views for advanced schema evolution
Reply Retweet Like
Apache Parquet retweeted
Jeeva 1 Feb 18
Is there a way to from mssql to as a parquet directly?
Reply Retweet Like
Apache Parquet retweeted
Lee Blum 11 Jan 18
I'll be speaking at Conference this May in London, and share our journey in one of our many adventures with . You're all invited!
Reply Retweet Like
Apache Parquet retweeted
fonzie 4 Jan 18
Reply Retweet Like
Apache Parquet retweeted
Shubham Chaudhary 4 Jan 18
Also the file size went down from 10Gigs to 3Gigs without any compression.
Reply Retweet Like
Apache Parquet retweeted
Shubham Chaudhary 4 Jan 18
Working with a 10Gig csv data. Pandas read_csv took 16mins to load the csv into memory. Converted to with . It took 30 secs to read into pyarrow table and 16 sec to convert to pandas dataframe. 16mins => 46sec!
Reply Retweet Like